We are aware of the issue with the badge emails resending to everyone, we apologise for the inconvenience - learn more here.
Forum Discussion
notoriusjack
2 years agoNew member | Level 2
reading parquet file using python sdk
Hi, I am trying to read a parquet file using pandas and vaex. I can sucessfully read a .csv but I get the following error message when I try to download the parquet file with dbx.files_download : ...
- 2 years ago
Thanks for following up and sharing that. For reference, the URL parameters for shared links like that are documented here.
notoriusjack
New member | Level 2
Thanks for replying. I have tried with one of your solutions but it's really slow.
file_list = []
for entry in dbx.files_list_folder(path="/County.parquet").entries:
print(entry.path_lower)
_, dwnld_file = dbx.files_download(entry.path_lower)
with io.BytesIO(dwnld_file.content) as stream:
pd_df = pd.read_parquet(stream) #this works
vdf = vaex.from_pandas(pd_df)
del pd_df
file_list.append(vdf)
conc_df = vaex.concat(file_list)
print(conc_df)
I have tried with dbx.files_download_zip but I can't find a way to read the data it returns 'utf-8' codec can't decode byte 0x82 in position 12: invalid start byte.
Greg-DB
2 years agoDropbox Staff
Yes, using files_list_folder/files_list_folder_continue and files_download (or files_download_to_file) requires more API calls so it would be less performant.
The files_download_zip method would return the requested data the same way files_download would, except that it would be zip data that you would need to unzip to access the original folder. Also, note that you can use files_download_zip_to_file if you want to save the zip data to a file. Can you share the code you're having trouble with for that, and indicate which line fails with that error?
- notoriusjack2 years agoNew member | Level 2
When I try:
md, zipFile = dbx.files_download_zip('/County_test.parquet')
with ZipFile(zipFile, 'r') as zip:with ZipFile(zipFile, 'r') as zip: fails with AttributeError: 'Response' object has no attribute 'seek'
I can't understand how to solve this
- Greg-DB2 years agoDropbox Staff
The files_download_zip method works like the files_download method, in that the second value it returns is the response object. To access the data from the response object, you would access the 'content' field like you did in your other code snippet. So, in this case, it would be 'zipFile.content'.
Beyond that, refer to the documentation for ZipFile, BytesIO, pandas, etc., for information on using those. Those aren't made by Dropbox so I can't offer support for those in particular.
- notoriusjack2 years agoNew member | Level 2
Thank you for your support, I just managed to do it with dbx.files_download_zip but it takes more or less the same time to process.
Do you know if pandas or vaex support reading the data directly from a file in Dropbox?
md, zipFile = dbx.files_download_zip('/County.parquet')
file_list = []
with ZipFile(io.BytesIO(zipFile.content), 'r') as zip_ref:
for file in zip_ref.infolist():
if file.filename.endswith('.parquet'):
pd_df = pd.read_parquet(zip_ref.open(file.filename)) # this works
vdf = vaex.from_pandas(pd_df)
del pd_df
file_list.append(vdf)
conc_df = vaex.concat(file_list)
print(conc_df)
About Dropbox API Support & Feedback
Find help with the Dropbox API from other developers.
5,877 PostsLatest Activity: 12 months agoIf you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X or Facebook.
For more info on available support options for your Dropbox plan, see this article.
If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!