0

How can I define a literal array in a Polars expression? For example, if I wanted to filter if an expression was true and a given value in a mask was true.

import polars as pl

df = pl.DataFrame(dict(x=[1,2,3,4,5,6]))
mask = [True, True, False, False, True, True]

df.filter(pl.col('x') % 2 == 0 & pl.lit(mask))
# could not convert value [True, True, False, False, True, True] as a Literal

For this particular example, I could use [] indexing on the data frame and then do the filter, but for more complicated expressions, it would be easier if I could insert the array into the expression.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
drhagen
  • 8,331
  • 8
  • 53
  • 82

1 Answers1

3
df.filter((pl.col("x") % 2 == 0) & pl.Series(mask))

Note the added parentheses. The (pl.col("x") % 2 == 0) produces a Series of boolean, which is then bit-anded row-wise with the pl.Series(mask).

Using a Series also allows you to set the data type of the values, which can come in handy.

The documentation of the Series constructor explains more.

  • So the main point (besides adding parentheses) was to use `pl.Series`, not `pl.lit`? – mkrieger1 Mar 08 '22 at 19:41
  • Correct. `pl.lit` is for broadcasting a single value to a column. `pl.Series` allows you to specify the array of contents for the column. In some sense, `pl.lit(1)` == `pl.Series([1,1,1,1,1,....])` –  Mar 08 '22 at 20:34
  • Interestingly, using a `Series` alone fails; it must be part of a larger expression. – drhagen Mar 12 '22 at 11:25