How to convert parquet to json

Question

I have parquet files hosted on S3 that I want to download and convert to JSON. I was able to use select_object_content to output certain files as JSON using SQL in the past. I need to find a faster way to do it because it is timing out for larger files.

I have tried the following:

df = pd.read_parquet(s3_location)
df = df.to_json(orient="records")

However, the JSON output from the above code includes the key paths (hotels.date.hotel_price INSTEAD OF hotels:{date:{hotel_price: 100}}.

Anyone know of a way to do this so that it comes out as the second type of JSON?

Import pyarrow.paraquet as pq \n pq.read_table('df.parquet').to_pydict() — Lucas Tieman, Aug 11 '23 at 05:27
Also, Polars supports pagination with it's use of the Rust paraquat reader. https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_parquet.html — Lucas Tieman, Aug 11 '23 at 05:40

score 0 · Answer 1 · answered Sep 26 '22 at 17:21

This might be too late of a response but for anyone else who runs into this same issue the easiest way is to download this parquet-viewer extension in VS Code and you will be able to preview your data as JSON.

Link to Extension for reference https://marketplace.visualstudio.com/items?itemName=dvirtz.parquet-viewer

How to convert parquet to json

1 Answers1