I've written a lazy data-processing function with polars to process a large parquet dataset. Is there a way I can select N rows from the parquet file and return a lazy dataset? I notice that both .fetch(N)
and .head(N)
return DataFrames, not LazyFrames. Do I have to do e.g. pl.scan_parquet(filename).fetch(100_000).lazy()
?
My dataset does not have a monotonically increasing id
column.
The intention is to see if my function finishes in reasonable time on a large slice of the dataset.