Forum Discussion

New member | Level 2

2 years ago

reading parquet file using python sdk

Hi, I am trying to read a parquet file using pandas and vaex. I can sucessfully read a .csv but I get the following error message when I try to download the parquet file with dbx.files_download : ...

API

Greg-DB
2 years ago
Thanks for following up and sharing that. For reference, the URL parameters for shared links like that are documented here.

Greg-DB

Dropbox Staff

2 years ago

I see you're getting a 'not_file' error, which means: "We were expecting a file, but the given path refers to something that isn’t a file."

It looks like the ".parquet" is not a file, but rather a sort of folder, possibly referred to as a "package" or "bundle" in some environments.

That being the case, to download that you would instead need to use files_download_zip (or files_download_zip_to_file) and then unzip the downloaded zip file, or walk through the contents using files_list_folder/files_list_folder_continue and then download each individual nested item using files_download (or files_download_to_file). The first option of using files_download_zip is probably better/faster.

notoriusjack
New member | Level 2
2 years ago
Thanks for replying. I have tried with one of your solutions but it's really slow.

file_list = []
for entry in dbx.files_list_folder(path="/County.parquet").entries:
print(entry.path_lower)
_, dwnld_file = dbx.files_download(entry.path_lower)
with io.BytesIO(dwnld_file.content) as stream:
pd_df = pd.read_parquet(stream) #this works
vdf = vaex.from_pandas(pd_df)
del pd_df
file_list.append(vdf)
conc_df = vaex.concat(file_list)
print(conc_df)
I have tried with dbx.files_download_zip but I can't find a way to read the data it returns 'utf-8' codec can't decode byte 0x82 in position 12: invalid start byte.
- Greg-DB
  Dropbox Staff
  2 years ago
  Yes, using files_list_folder/files_list_folder_continue and files_download (or files_download_to_file) requires more API calls so it would be less performant.
  
  The files_download_zip method would return the requested data the same way files_download would, except that it would be zip data that you would need to unzip to access the original folder. Also, note that you can use files_download_zip_to_file if you want to save the zip data to a file. Can you share the code you're having trouble with for that, and indicate which line fails with that error?
  - notoriusjack
    New member | Level 2
    2 years ago
    When I try:
    
    md, zipFile = dbx.files_download_zip('/County_test.parquet')
    with ZipFile(zipFile, 'r') as zip:
    with ZipFile(zipFile, 'r') as zip: fails with AttributeError: 'Response' object has no attribute 'seek'
    
    I can't understand how to solve this

About Dropbox API Support & Feedback

Find help with the Dropbox API from other developers.

5,877 PostsLatest Activity: 12 months ago

If you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X or Facebook.

For more info on available support options for your Dropbox plan, see this article.

If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!

Forum Discussion

reading parquet file using python sdk

About Dropbox API Support & Feedback

Related Content

Python Teams Download file

API access with refresh_token in python

Export legacy paper via python sdk does not work

Python error issue with dbx_team.team_member_space_limits_excluded_users_add

Extract Team storage report from PYTHON SDK

Recent Discussions

get_shared_link_metadata API problem with the new '/scl' folder links.

Trying to download a file, but no file is being downloaded.

Dropbox internal error occured - Resolution

How to add the entire team to team folder by api

API Upload .HEIC and convert to JPG