(Polars) How to get element from a column with list by index specified in another column

Question

I have a dataframe with 2 columns, where first column contains lists, and second column integer indexes. How to get elements from first column by index specified in second column? Or even better, put that element in 3rd column. So for example, how from this

a = pl.DataFrame([{'lst': [1, 2, 3], 'ind': 1}, {'lst': [4, 5, 6], 'ind': 2}])
┌───────────┬─────┐
│ lst       ┆ ind │
│ ---       ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 1   │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   │
└───────────┴─────┘

you can get this

b = pl.DataFrame([{'lst': [1, 2, 3], 'ind': 1, 'list[ind]': 2}, {'lst': [4, 5, 6], 'ind': 2, 'list[ind]': 6}])
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

Thanks.

cccs31 · Accepted Answer · 2022-10-28T16:08:46.223

Edit

As of python polars 0.14.24 this can be done more easily by

df.with_column(pl.col("lst").arr.get(pl.col("ind")).alias("list[ind]"))

Original answer

You can use with_row_count() to add a row count column for grouping, then explode() the list so each list element is on each row. Then call take() over the row count column using over() to select the element from the subgroup.

df = pl.DataFrame({"lst": [[1, 2, 3], [4, 5, 6]], "ind": [1, 2]})

df = (
    df.with_row_count()
    .with_column(
        pl.col("lst").explode().take(pl.col("ind")).over(pl.col("row_nr")).alias("list[ind]")
    )
    .drop("row_nr")
)

shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

This is ingenious. Thanks. I'll test its performance and report back. — Kaster, Oct 26 '22 at 07:40
Pandas solution (df['lst'].str[0]) does not make much sense but it is still much better than such a complicated solution :/ — the_economist, Jan 21 '23 at 21:30
this was changed in polars - .arr was renamed to .list --> https://pola-rs.github.io/polars/py-polars/html/reference/expressions/list.html — genegc, Aug 18 '23 at 16:24

score 2 · Answer 2 · answered Oct 26 '22 at 05:36

Here is my approach:

Create a custom function to get the values as per the required index.

def get_elem(d):
    sel_idx = d[0]
    return d[1][sel_idx]

here is a test data.

df = pl.DataFrame({'lista':[[1,2,3],[4,5,6]],'idx':[1,2]})

Now lets create a struct on these two columns(it will create a dict) and apply an above function

df.with_columns([
    pl.struct(['idx','lista']).apply(lambda x: get_elem(list(x.values()))).alias('req_elem')])

shape: (2, 3)
┌───────────┬─────┬──────────┐
│ lista     ┆ idx ┆ req_elem │
│ ---       ┆ --- ┆ ---      │
│ list[i64] ┆ i64 ┆ i64      │
╞═══════════╪═════╪══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   ┆ 6        │
└───────────┴─────┴──────────┘

Thanks. Let's wait if it's possible using native polars api for performance reasons. My actual dataframe is pretty big so performance matters. — Kaster, Oct 26 '22 at 06:12

NedDasty · Answer 3 · 2022-10-27T14:39:13.200

If your number of unique idx elements isn't absolutely massive, you can build a when/then expression to select based on the value of idx using list.get(idx):

import polars as pl

df = pl.DataFrame([{"lst": [1, 2, 3], "ind": 1}, {"lst": [4, 5, 6], "ind": 2}])

# create when/then expression for each unique index
idxs = df["ind"].unique()
ind, lst = pl.col("ind"), pl.col("lst") # makes expression generator look cleaner

expr = pl.when(ind == idxs[0]).then(lst.arr.get(idxs[0]))
for idx in idxs[1:]:
    expr = expr.when(ind == idx).then(lst.arr.get(idx))
expr = expr.otherwise(None)

df.select(expr)

shape: (2, 1)
┌─────┐
│ lst │
│ --- │
│ i64 │
╞═════╡
│ 2   │
├╌╌╌╌╌┤
│ 6   │
└─────┘

(Polars) How to get element from a column with list by index specified in another column

3 Answers3