-1

I have a dataframe like this:

category year count
apple 2022 5
apple 2021 8
banana 2022 1
cold 2022 9
cold 2021 2
warm 2022 1
warm 2021 3

I need to group the rows based on a pre-set list of groupings ('fruit', 'temperature') and then aggregate by year. The final DF would look like this:

category year count
fruit 2022 6
fruit 2021 8
temp 2022 10
temp 2021 5

The Category values are strings. I'm looking for any solution to make this work. The actual dataframe is quite a bit longer, so I'm hoping to use something like a dict with the groupings to aggregate.

TylerH
  • 20,799
  • 66
  • 75
  • 101

1 Answers1

1

I would map_dict the category column to standardize that to fruit/temperature, then a groupby:

md = {"apple": "fruit", "banana": "fruit", "cold": "temp", "warm": "temp"}

df.with_columns(pl.col("category").map_dict(md)).groupby("category", "year").sum()
shape: (4, 3)
┌──────────┬──────┬───────┐
│ category ┆ year ┆ count │
│ ---      ┆ ---  ┆ ---   │
│ str      ┆ i64  ┆ i64   │
╞══════════╪══════╪═══════╡
│ fruit    ┆ 2022 ┆ 6     │
│ temp     ┆ 2022 ┆ 10    │
│ fruit    ┆ 2021 ┆ 8     │
│ temp     ┆ 2021 ┆ 5     │
└──────────┴──────┴───────┘

You could also do a when/then chain to standardize the category column, but in more complicated examples map_dict will be more concise code.

Wayoshi
  • 1,688
  • 1
  • 7