Is there a way to perform lazy operations in Polars on multiple frames at the same time?

Question

Lazy more and query optimization in Polars is a great tool for saving on memory allocations and CPU usage for a single data frame. I wonder if there is a way to do this for multiple lazy frames as:

lpdf1 = pdf1.lazy()
lpdf2 = pdf2.lazy()

result_lpdf = -lpdf1/lpdf2
result_pdf = result_lpdf.collect()

The above code will not run, as division and negation is not implemented for LazyFrame. Yet my aim would be to create the new result_pdf frame without creating temporary frames for division, then yet another for negation (as it would be the case in pandas and numpy).

I'm trying to get some performance improvement relative to -pdf1/pdf2, on frames of size (283681, 93). Any suggestions are welcome.

"*as it would be the case in pandas and numpy*" Not if you use in-place Numpy operations (which is possible for the division/subtraction). — Jérôme Richard, Dec 17 '22 at 14:35
That is true, also if using `numexpr`. Nonetheless my aim would be to replace `numpy` in favor of `polars`. — Mark Horvath, Dec 17 '22 at 14:43

score 3 · Accepted Answer · answered Dec 17 '22 at 15:44

You can use .with_context()

Adding a suffix to one set of columns allows you to distinguish between them.

left = pl.DataFrame(dict(a=[-16, -12, -9], b=[20, 12, 10])).lazy()
right = pl.DataFrame(dict(a=[4, 3, 3], b=[10, 2, 5])).lazy()
(
   left
   .with_context(right.select(pl.all().suffix("_right")))
   .select(
      pl.col(name) * -1 / pl.col(f"{name}_right")
      for name in left.columns
   )
   .collect()
)
shape: (3, 2)
┌─────┬──────┐
│ a   | b    │
│ --- | ---  │
│ f64 | f64  │
╞═════╪══════╡
│ 4.0 | -2.0 │
├─────┼──────┤
│ 4.0 | -6.0 │
├─────┼──────┤
│ 3.0 | -2.0 │
└─//──┴─//───┘

Great trick! Seems like what I was looking for, yet there is one last bit I still don't follow. Would you expect this to be faster than simply doing -pdf1/pdf2 in eager mode? I'm finding the two has very similar performance. Could it be that this still allocates temporary frames (or series)? I'm editing the question to include my frame shapes. — Mark Horvath, Dec 17 '22 at 17:03

Is there a way to perform lazy operations in Polars on multiple frames at the same time?

1 Answers1

Linked