3

I've have dataframe with column b with list elements, I need to create column c that counts number elements in list for every row. Here is toy example in Pandas:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3], 'b':[[1,2,3], [2], [5,0]]})

    a   b
0   1   [1, 2, 3]
1   2   [2]
2   3   [5, 0]

df.assign(c=df['b'].str.len())

    a   b           c
0   1   [1, 2, 3]   3
1   2   [2]         1
2   3   [5, 0]      2

Here is my equivalent in Polars:

import polars as pl

dfp = pl.DataFrame({'a': [1,2,3], 'b':[[1,2,3], [2], [5,0]]})

dfp.with_columns(pl.col('b').apply(lambda x: len(x)).alias('c'))

I've a feeling that .apply(lambda x: len(x)) is not optimal.

Is a better way to do it in Polars?

Quant Christo
  • 1,275
  • 9
  • 23

2 Answers2

5

Update: The .arr namespace was renamed to .list in v0.18.0

You can use .list.lengths()

df.with_columns(c = pl.col("b").list.lengths())
shape: (3, 3)
┌─────┬───────────┬─────┐
│ a   ┆ b         ┆ c   │
│ --- ┆ ---       ┆ --- │
│ i64 ┆ list[i64] ┆ u32 │
╞═════╪═══════════╪═════╡
│ 1   ┆ [1, 2, 3] ┆ 3   │
│ 2   ┆ [2]       ┆ 1   │
│ 3   ┆ [5, 0]    ┆ 2   │
└─────┴───────────┴─────┘
jqurious
  • 9,953
  • 1
  • 4
  • 14
2

Since polars 0.18.0 arr has been renamed to list. A working solution now is thus (taken from the above example):

df.with_column(pl.col("b").list.lengths().alias("c"))
EricLeer
  • 41
  • 4