0

In a Polars dataframe, I know that I can aggregate over a group of rows that have the same value in a column using for example .groupby("first_name").agg([...]).

How can I aggregate over all rows in a dataframe?

For example, I'd like to get the mean of all values in a column.

bwooster
  • 47
  • 7

1 Answers1

1

As suggested by @jqurious, you can use mean() to obtain the mean, without adding an aggregation.

Examples.

import polars as pl

# sample dataframe
df = pl.DataFrame({
    'text':['a','a','b','b'],
    'value':[1,2,3,4]
})

shape: (4, 2)
┌──────┬───────┐
│ text ┆ value │
│ ---  ┆ ---   │
│ str  ┆ i64   │
╞══════╪═══════╡
│ a    ┆ 1     │
│ a    ┆ 2     │
│ b    ┆ 3     │
│ b    ┆ 4     │
└──────┴───────┘

# add the mean with select
df.select(
    value_mean = pl.mean('value')
)

shape: (1, 1)
┌────────────┐
│ value_mean │
│ ---        │
│ f64        │
╞════════════╡
│ 2.5        │
└────────────┘

# add the mean with with_columns

df.with_columns(
    value_mean = pl.mean('value')
)

shape: (4, 3)
┌──────┬───────┬────────────┐
│ text ┆ value ┆ value_mean │
│ ---  ┆ ---   ┆ ---        │
│ str  ┆ i64   ┆ f64        │
╞══════╪═══════╪════════════╡
│ a    ┆ 1     ┆ 2.5        │
│ a    ┆ 2     ┆ 2.5        │
│ b    ┆ 3     ┆ 2.5        │
│ b    ┆ 4     ┆ 2.5        │
└──────┴───────┴────────────┘

Using select, only the columns specified in select will show up in the result. Using with_columns, all columns will show up in the result plus any column you add or modify.

For that, the result of select is one row while the result of with_columns is the 4 rows of the sample dataframe.

Luca
  • 1,216
  • 6
  • 10