1

I'm trying to do a aggregation from a polars DataFrame. But I'm not getting what I'm expecting.

This is a minimal replication of the issue:


import polars as pl

# Create a DataFrame
df = pl.DataFrame({"category": ["A", "A", "B", "B", "B"],
"value": [1., 2., 3., 4., 5.]})

# Group by 'category' and sum 'value'
result = df.groupby("category").agg({"value": pl.sum})

# Print the result
print(result)

And I'm getting:

┌──────────┬─────────────────┐
│ category ┆ value           │
│ ---      ┆ ---             │
│ str      ┆ list[f64]       │
╞══════════╪═════════════════╡
│ A        ┆ [1.0, 2.0]      │
│ B        ┆ [3.0, 4.0, 5.0] │
└──────────┴─────────────────┘

and I'd like to get:

┌──────────┬─────────────────┐
│ category ┆ value           │
│ ---      ┆ ---             │
│ str      ┆ list[f64]       │
╞══════════╪═════════════════╡
│ A        ┆ 3.0             │
│ B        ┆ 12.0            │
└──────────┴─────────────────┘

Any ideas on where the issue? thanks in advance.

Jose Nuñez
  • 11
  • 1
  • 4

1 Answers1

5

You just pass in expressions into the agg function, not a dict:

result = df.groupby("category").agg(pl.col('value').sum())

Wayoshi
  • 1,688
  • 1
  • 7