I have a .parquet
file, and would like to use Python to quickly and efficiently query that file by a column.
For example, I might have a column name
in that .parquet
file and want to get back the first (or all of) the rows with a chosen name.
How can I query a parquet file like this in the Polars API, or possibly FastParquet (whichever is faster)?
I thought pl.scan_parquet
might be helpful but realised it didn't seem so, or I just didn't understand it. Preferably, though it is not essential, we would not have to read the entire file into memory first, to reduce memory and CPU usage.
I thank you for your help.