Lazy selecting rows in polars?

Question

I've written a lazy data-processing function with polars to process a large parquet dataset. Is there a way I can select N rows from the parquet file and return a lazy dataset? I notice that both .fetch(N) and .head(N) return DataFrames, not LazyFrames. Do I have to do e.g. pl.scan_parquet(filename).fetch(100_000).lazy()?

My dataset does not have a monotonically increasing id column.

The intention is to see if my function finishes in reasonable time on a large slice of the dataset.

score 1 · Answer 1 · answered Jul 07 '22 at 11:54

1

I had simply overlooked .limit(). Usage is then:

pl.scan_parquet(filename).limit(n=N)

It looks to me like the .fetch() operation will recursively perform a .limit throughout the lazy query, which will allow fast debugging.

answered Jul 07 '22 at 11:54

TomNorway

2,584
1
19
26

Lazy selecting rows in polars?

1 Answers1