In a Polars groupby aggregation, how do you concatenate string values in each group?

Question

When grouping a Polars dataframe in Python, how do you concatenate string values from a single column across rows within each group?

For example, given the following DataFrame:

import polars as pl

df = pl.DataFrame(
    {
        "col1": ["a", "b", "a", "b", "c"],
        "col2": ["val1", "val2", "val1", "val3", "val3"]
    }
)

Original df:

shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ a    ┆ val1 │
│ b    ┆ val2 │
│ a    ┆ val1 │
│ b    ┆ val3 │
│ c    ┆ val3 │
└──────┴──────┘

I want to run a groupby operation, like:


df.groupby('col1').agg(
    col2_g = pl.col('col2').some_function_like_join(',')
)

The expected output is:

┌──────┬───────────┐
│ col1 ┆ col2_g    │
│ ---  ┆ ---       │
│ str  ┆ str       │
╞══════╪═══════════╡
│ a    ┆ val1,val1 │
│ b    ┆ val2,val3 │
│ c    ┆ val3      │
└──────┴───────────┘

What is the name of the some_function_like_join function?

I have tried the following methods, and none work:

df.groupby('col1').agg(pl.col('col2').arr.concat(','))
df.groupby('col1').agg(pl.col('col2').join(','))
df.groupby('col1').agg(pl.col('col2').arr.join(','))

It's not clear from your question what you want the output to be. — Dean MacGregor, May 11 '23 at 01:58

score 3 · Accepted Answer · answered May 11 '23 at 08:39

If you want to concatenate them, I assume you want the result as a string with your specified delimiter:

out = df.groupby("col1").agg(
    pl.col("col2").str.concat(",")
)

Result:

shape: (3, 2)
┌──────┬───────────┐
│ col1 ┆ col2      │
│ ---  ┆ ---       │
│ str  ┆ str       │
╞══════╪═══════════╡
│ a    ┆ val1,val1 │
│ b    ┆ val2,val3 │
│ c    ┆ val3      │
└──────┴───────────┘

If you want them within a List, you simply do:

out = df.groupby("col1").agg(
    pl.col("col2")
)

Result:

shape: (3, 2)
┌──────┬──────────────────┐
│ col1 ┆ col2             │
│ ---  ┆ ---              │
│ str  ┆ list[str]        │
╞══════╪══════════════════╡
│ a    ┆ ["val1", "val1"] │
│ c    ┆ ["val3"]         │
│ b    ┆ ["val2", "val3"] │
└──────┴──────────────────┘

score 0 · Answer 2 · answered May 11 '23 at 00:27

0

I think the most straightforward way is to do a with_columns after the agg. The aggregated columns will be a List dtype:

df.groupby('col1').agg(pl.col('col2')).with_columns(pl.col('col2').arr.concat(','))

answered May 11 '23 at 00:27

Wayoshi

1,688
1
7

In a Polars groupby aggregation, how do you concatenate string values in each group?

2 Answers2