I want to do a search sorted between the lists in column a and b:
import polars as pl
df = pl.DataFrame(
{
"a": [[1, 2, 3], [8, 9]],
"b": [[2], [10, 6]]
}
)
print(df)
res = df.lazy().with_columns(
[
pl.col("a").explode().search_sorted(pl.col("b").explode(), side="left").implode().alias("c")
]
)
print(res.collect())
The result I get is:
shape: (2, 3)
┌───────────┬───────────┬───────────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ list[u32] │
╞═══════════╪═══════════╪═══════════╡
│ [1, 2, 3] ┆ [2] ┆ [1, 5, 3] │
│ [8, 9] ┆ [10, 6] ┆ [1, 5, 3] │
└───────────┴───────────┴───────────┘
But I was hoping for the search_sorted to be evaluated per row, not per column.
So that the first row should have the result [1] while the second should have [2, 0].
How can I do that?
Edit: I understand why the above does not work. Explode "concats" the list in each row to create one column. But I cannot seem to be able to run search_sorted without exploding the columns.
If I do not explode the columns I get the error message
exceptions.InvalidOperationError: `search_sorted` operation not supported for dtype `list[i64]`
Edit 2:
I can use search sorted on one of the lists, but then I cannot reference the list in the other column:
res = df.lazy().select(
[
pl.col("a").list.eval(pl.element().search_sorted(pl.col("b"), side="left")).implode().alias("c")
]
)
Leads to the error:
exceptions.ComputeError: named columns are not allowed in `arr.eval`; consider using `element` or `col("")`