Use multiple columns in list expression

Question

I want to do a search sorted between the lists in column a and b:

import polars as pl

df = pl.DataFrame(
    {
        "a": [[1, 2, 3], [8, 9]],
        "b": [[2], [10, 6]]
    }
)

print(df)

res = df.lazy().with_columns(
    [
        pl.col("a").explode().search_sorted(pl.col("b").explode(), side="left").implode().alias("c")
    ]
)

print(res.collect())

The result I get is:

shape: (2, 3)
┌───────────┬───────────┬───────────┐
│ a         ┆ b         ┆ c         │
│ ---       ┆ ---       ┆ ---       │
│ list[i64] ┆ list[i64] ┆ list[u32] │
╞═══════════╪═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2]       ┆ [1, 5, 3] │
│ [8, 9]    ┆ [10, 6]   ┆ [1, 5, 3] │
└───────────┴───────────┴───────────┘

But I was hoping for the search_sorted to be evaluated per row, not per column.

So that the first row should have the result [1] while the second should have [2, 0].

How can I do that?

Edit: I understand why the above does not work. Explode "concats" the list in each row to create one column. But I cannot seem to be able to run search_sorted without exploding the columns.

If I do not explode the columns I get the error message

exceptions.InvalidOperationError: `search_sorted` operation not supported for dtype `list[i64]`

Edit 2:

I can use search sorted on one of the lists, but then I cannot reference the list in the other column:

res = df.lazy().select(
    [
        pl.col("a").list.eval(pl.element().search_sorted(pl.col("b"), side="left")).implode().alias("c")
    ]
)

Leads to the error:

exceptions.ComputeError: named columns are not allowed in `arr.eval`; consider using `element` or `col("")`

score 0 · Answer 1 · answered Jun 10 '23 at 17:03

This seems like it might be inefficient:

res = df.lazy().with_row_count().groupby("row_nr").agg(
    [pl.col("a").explode().search_sorted(pl.col("b").explode()).alias("c")]
).collect()
print(res)
┌────────┬───────────┐
│ row_nr ┆ c         │
│ ---    ┆ ---       │
│ u32    ┆ list[u32] │
╞════════╪═══════════╡
│ 0      ┆ [1]       │
│ 1      ┆ [2, 0]    │
└────────┴───────────┘

Any better suggestions?

This is just a variation of this answer: Filter list using another list as a boolean mask in polars

Use multiple columns in list expression

1 Answers1

Linked