3

Is it possible to use pandas to selectively read rows from Parquet files using its column index?

Similarly, when writing a Pandas DataFrame to a Parquet file, such as using pd.DataFrame.to_parquet(), is it possible to specify the DataFrame column or index level to be used as the Parquet column index?

I am hoping that the use of Parquet index can speed up read/writes.

Currently using fastparquet 0.4.0, pandas 1.0.3, and Python 3.8.3.

Athena Wisdom
  • 6,101
  • 9
  • 36
  • 60
  • Parquet has no direct equivalent concept as a pandas DataFrame index. However, it does support keeping track of min/max statistics of chunks of the file, which allows skipping parts of the file when reading with a certain filter. But this can be done for any column in the Parquet file, if statistics are written. – joris Jun 09 '20 at 13:34

0 Answers0