Combining multiple groups in Polars

Question

I have a dataframe like this:

category	year	count
apple	2022	5
apple	2021	8
banana	2022	1
cold	2022	9
cold	2021	2
warm	2022	1
warm	2021	3

I need to group the rows based on a pre-set list of groupings ('fruit', 'temperature') and then aggregate by year. The final DF would look like this:

category	year	count
fruit	2022	6
fruit	2021	8
temp	2022	10
temp	2021	5

The Category values are strings. I'm looking for any solution to make this work. The actual dataframe is quite a bit longer, so I'm hoping to use something like a dict with the groupings to aggregate.

Wayoshi · Answer 1 · 2023-05-22T17:58:43.983

I would map_dict the category column to standardize that to fruit/temperature, then a groupby:

md = {"apple": "fruit", "banana": "fruit", "cold": "temp", "warm": "temp"}

df.with_columns(pl.col("category").map_dict(md)).groupby("category", "year").sum()

shape: (4, 3)
┌──────────┬──────┬───────┐
│ category ┆ year ┆ count │
│ ---      ┆ ---  ┆ ---   │
│ str      ┆ i64  ┆ i64   │
╞══════════╪══════╪═══════╡
│ fruit    ┆ 2022 ┆ 6     │
│ temp     ┆ 2022 ┆ 10    │
│ fruit    ┆ 2021 ┆ 8     │
│ temp     ┆ 2021 ┆ 5     │
└──────────┴──────┴───────┘

You could also do a when/then chain to standardize the category column, but in more complicated examples map_dict will be more concise code.

Combining multiple groups in Polars

1 Answers1