Pandas Read/Write Parquet Data using Column Index

Asked Jun 07 '20 at 22:03

Active Jun 07 '20 at 22:03

Viewed 1,218 times

Is it possible to use pandas to selectively read rows from Parquet files using its column index?

Similarly, when writing a Pandas DataFrame to a Parquet file, such as using pd.DataFrame.to_parquet(), is it possible to specify the DataFrame column or index level to be used as the Parquet column index?

I am hoping that the use of Parquet index can speed up read/writes.

Currently using fastparquet 0.4.0, pandas 1.0.3, and Python 3.8.3.

asked Jun 07 '20 at 22:03

Athena Wisdom

6,101
9
36
60

Parquet has no direct equivalent concept as a pandas DataFrame index. However, it does support keeping track of min/max statistics of chunks of the file, which allows skipping parts of the file when reading with a certain filter. But this can be done for any column in the Parquet file, if statistics are written. – joris Jun 09 '20 at 13:34

Pandas Read/Write Parquet Data using Column Index

0 Answers0

Linked