When grouping a Polars dataframe in Python, how do you concatenate string values from a single column across rows within each group?
For example, given the following DataFrame:
import polars as pl
df = pl.DataFrame(
{
"col1": ["a", "b", "a", "b", "c"],
"col2": ["val1", "val2", "val1", "val3", "val3"]
}
)
Original df:
shape: (5, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ a ┆ val1 │
│ b ┆ val2 │
│ a ┆ val1 │
│ b ┆ val3 │
│ c ┆ val3 │
└──────┴──────┘
I want to run a groupby operation, like:
df.groupby('col1').agg(
col2_g = pl.col('col2').some_function_like_join(',')
)
The expected output is:
┌──────┬───────────┐
│ col1 ┆ col2_g │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪═══════════╡
│ a ┆ val1,val1 │
│ b ┆ val2,val3 │
│ c ┆ val3 │
└──────┴───────────┘
What is the name of the some_function_like_join
function?
I have tried the following methods, and none work:
df.groupby('col1').agg(pl.col('col2').arr.concat(','))
df.groupby('col1').agg(pl.col('col2').join(','))
df.groupby('col1').agg(pl.col('col2').arr.join(','))