I have parquet files hosted on S3 that I want to download and convert to JSON. I was able to use select_object_content to output certain files as JSON using SQL in the past. I need to find a faster way to do it because it is timing out for larger files.
I have tried the following:
df = pd.read_parquet(s3_location)
df = df.to_json(orient="records")
However, the JSON output from the above code includes the key paths (hotels.date.hotel_price INSTEAD OF hotels:{date:{hotel_price: 100}}.
Anyone know of a way to do this so that it comes out as the second type of JSON?