What you want is to be able to utilize the full expression API whilst operating on certain sub-elements or groups. That's what a groupby is!
So ideally we groom our DataFrame
in a state where very group corresponds to the elements of our lists.
First we start with some data and and then we add a row_idx
that will represent out unique groups.
df = pl.DataFrame({
"idx": [[0], [1], [0, 2]],
"array": [["a", "b"], ["c", "d"], ["e", "f", "g"]]
}).with_row_count("row_nr")
print(df)
shape: (3, 3)
┌────────┬───────────┬─────────────────┐
│ row_nr ┆ idx ┆ array │
│ --- ┆ --- ┆ --- │
│ u32 ┆ list[i64] ┆ list[str] │
╞════════╪═══════════╪═════════════════╡
│ 0 ┆ [0] ┆ ["a", "b"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ [1] ┆ ["c", "d"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ [0, 2] ┆ ["e", "f", "g"] │
└────────┴───────────┴─────────────────┘
Next we explode
by the "idx"
column so that we can we create the groups for our groupby.
df = df.explode("idx")
print(df)
shape: (4, 3)
┌────────┬─────┬─────────────────┐
│ row_nr ┆ idx ┆ array │
│ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ list[str] │
╞════════╪═════╪═════════════════╡
│ 0 ┆ 0 ┆ ["a", "b"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 1 ┆ ["c", "d"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 0 ┆ ["e", "f", "g"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ 2 ┆ ["e", "f", "g"] │
└────────┴─────┴─────────────────┘
Finally we can apply the groupby
and take the subelements for each list/group.
(df
.groupby("row_nr")
.agg([
pl.col("array").first(),
pl.col("idx"),
pl.col("array").first().take(pl.col("idx")).alias("arr_taken")
])
)
This returns:
shape: (3, 4)
┌────────┬─────────────────┬───────────┬────────────┐
│ row_nr ┆ array ┆ idx ┆ arr_taken │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ list[str] ┆ list[i64] ┆ list[str] │
╞════════╪═════════════════╪═══════════╪════════════╡
│ 0 ┆ ["a", "b"] ┆ [0] ┆ ["a"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ ["c", "d"] ┆ [1] ┆ ["d"] │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ ["e", "f", "g"] ┆ [0, 2] ┆ ["e", "g"] │
└────────┴─────────────────┴───────────┴────────────┘