0

I have a bunch of json files with a list of dict in them:

[
{'a': 5.523275199811905e-05, 'b': 0.0016375015413388609, 'c': -0.0166875154286623,        'd': -0.06936456641533968, 'e': -0.06665282790239256, 'f': -0.13665749109868952, 'g':   1670519207414, 'h': 1670519204046, 'y': ''}
{'a': 5.523275199811905e-05, 'b': 0.0016375015413388609, 'c': -0.0166875154286623,   'd': -0.06936456641533968, 'e': -0.06665282790239256, 'f': -0.13665749109868952, 'g':   1670519207414, 'h': 1670519204046, 'y': ''}
]

Splitting the dict up and save them each to a separate csv file, was successful but I need to have parquet files - each for a dataset. Is there a way to solve this without spark?

I received a parquet file with partitioning but this is not what i need actually.

table_from_pandas = pa.Table.from_pandas(df)
pq.write_to_dataset(table_from_pandas, 
                    root_path='ddp_final/handytracking.parquet',
                    partition_cols=['timestamp']
                   ) 
Martin
  • 158
  • 1
  • 3
  • 13
stained
  • 3
  • 1
  • pandas is able to write parquet files, so you can export your csv as parquet with it. See here: https://pandas.pydata.org/pandas-docs/version/1.1/reference/api/pandas.DataFrame.to_parquet.html – flipSTAR Dec 15 '22 at 09:35
  • Yes, but that does not support splitting up the datasets in the list and save them each in multiple parquets. – stained Dec 15 '22 at 10:22
  • As of reading this: https://stackoverflow.com/a/66582990/13843906 I thought it would work the way you want it. – flipSTAR Dec 15 '22 at 10:37

0 Answers0