How to get current index of element in polars list

Question

When evaluating list elements I would like to know and use the current index. Is there already a way of doing it?

Something like pl.element().idx() ?

import polars as pl

data = {"a": [[1,2,3],[4,5,6]]}
schema = {"a": pl.List(pl.Int8)}

df = pl.DataFrame(data, schema=schema).with_columns([
    pl.col("a").list.eval(pl.element() * pl.element().idx() )
])

Expected result:

+-------------+
¦ a           ¦
¦ ---         ¦
¦ list[u8]    ¦
¦-------------¦
¦ [0, 2, 6]   ¦
¦ [0, 5, 12]  ¦
+-------------+

This feels like the x, y problem. What do you want to achieve? — ritchie46, Jun 21 '23 at 12:42

score 2 · Accepted Answer · answered Jun 21 '23 at 13:20

The best way (that I know of) is to make a row index, explode, use cumcount with a window function to create the idx (I'm calling it j), and then put it back together with groupby/agg

(
    df
        .with_row_count('i')
        .explode('a')
        .with_columns(j=pl.first().cumcount().over('i'))
        .with_columns(new=pl.col('a')*pl.col('j'))
        .groupby('i', maintain_order=True)
        .agg(pl.col('new'))
        .drop('i')
)

score 1 · Answer 2 · answered Jun 21 '23 at 12:27

You can use the apply method along with the enumerate function to achieve the desired result of accessing the current index of each element in a list column. Here's an example of how you can do it:

import polars as pl

data = {"a": [[1,2,3],[4,5,6]]}
schema = {"a": pl.List(pl.Int8)}

df = pl.DataFrame(data, schema=schema).with_columns([
    pl.col("a").apply(lambda arr: [x * i for i, x in enumerate(arr)])
])

print(df)

Output

shape: (2, 1)
┌────────────┐
│ a          │
│ ---        │
│ list[i64]  │
╞════════════╡
│ [0, 2, 6]  │
│ [0, 5, 12] │
└────────────┘

Thanks for the answer, but I would like to avoid apply. When working with large data sets, apply is to slow. — wKollendorf, Jun 21 '23 at 13:02

How to get current index of element in polars list

2 Answers2