3

I often need to retrieve a row from a Polars DataFrame given a collection of column values, like I might use a composite key in a database. This is possible in Polars using DataFrame.row, but the resulting expression is very verbose:

row_index = {'treatment': 'red', 'batch': 'C', 'unit': 76}

row = df.row(by_predicate=(
    (pl.col('treatment') == row_index['treatment'])
    & (pl.col('batch') == row_index['batch'])
    & (pl.col('unit') == row_index['unit'])
))

The most succinct method I've found is

from functools import reduce
from operator import and_

expr = reduce(and_, (pl.col(k) == v for k, v in row_index.items()))

row = df.row(by_predicate=expr)

But that is still verbose and hard to read. Is there an easier way? Possibly a built-in Polars functionality I'm missing?

Etherian
  • 119
  • 2
  • 5

1 Answers1

3

(a == b) & (c == d) will return true if all of the conditions are true.

This means one can also use pl.all to express the same thing:

pl.all([a == b, c == d])

Similarly, for (a == b) | (c == d) you could use pl.any.

Comprehensions (or generator expressions) can be passed directly without the need for reduce:

df.filter(
   pl.all(pl.col(k) == v for k, v in row_index.items())
)

Or using df.row as in your example:

predicate = pl.all(
   pl.col(k) == v for k, v in row_index.items()
)

df.row(by_predicate=predicate)
jqurious
  • 9,953
  • 1
  • 4
  • 14
  • Excellent answer! Thank you for the quick response. Though, since my question specifically mentions `DataFrame.row`, could you include an example with that method as well, to make clear `pl.all` also works with it? To avoid confusion for any future readers. – Etherian Apr 10 '23 at 04:06
  • @Etherian I've expanded a bit on the answer - feel free to edit it if you think it needs improving. – jqurious Apr 10 '23 at 08:06
  • Perfection! Thank you for such a thorough answer. – Etherian Apr 10 '23 at 13:33