3

I would like to add a column that is the sum of all columns but some id columns with polars. This can be done using polars.DataFrame.sum(axis=1):

import polars as pl
df = pl.DataFrame(
    {
        "id": [1, 2],
        "cat_a": [2, 7],
        "cat_b": [5, 1],
        "cat_c": [0, 3]
    }
)
df["cat_total"] = df.select(pl.all().exclude("id")).sum(axis=1)
df

However, this really feels like pandas style. I would prefer to be able to have this inside a longer sequence of calls inside a select or with_column call:

# Throws TypeError: sum() got an unexpected keyword argument 'axis'
# because polars.Expr.sum does not support choosing an axis
(df
     # [...]
    .with_column(pl.all().exclude("id").sum(axis=1).alias("cat_total"))
     # [...]
)

How can this be done (without explicitly identifying the column names)?

ritchie46
  • 10,405
  • 1
  • 24
  • 43
datenzauber.ai
  • 379
  • 2
  • 11

2 Answers2

3

You can use a fold expression, which takes an accumulator: acc, a binary function Fn(acc, Series) -> Series and one or more expression to apply the fold on.

df.with_columns(
    pl.fold(0, lambda acc, s: acc + s, pl.all().exclude("id")).alias("horizontal_sum")
)

This would output:

shape: (2, 5)
┌─────┬───────┬───────┬───────┬────────────────┐
│ id  ┆ cat_a ┆ cat_b ┆ cat_c ┆ horizontal_sum │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---            │
│ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64            │
╞═════╪═══════╪═══════╪═══════╪════════════════╡
│ 1   ┆ 2     ┆ 5     ┆ 0     ┆ 7              │
├╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ 7     ┆ 1     ┆ 3     ┆ 11             │
└─────┴───────┴───────┴───────┴────────────────┘

pedrosaurio
  • 4,708
  • 11
  • 39
  • 53
ritchie46
  • 10,405
  • 1
  • 24
  • 43
1

You can also do this combining with_columns and select:

df.with_columns(
    df.select(pl.all().exclude("id")).sum(axis=1).alias("cat_total")
)

Which yields the same result:

shape: (2, 5)
┌─────┬───────┬───────┬───────┬───────────┐
│ id  ┆ cat_a ┆ cat_b ┆ cat_c ┆ cat_total │
│ --- ┆ ---   ┆ ---   ┆ ---   ┆ ---       │
│ i64 ┆ i64   ┆ i64   ┆ i64   ┆ i64       │
╞═════╪═══════╪═══════╪═══════╪═══════════╡
│ 1   ┆ 2     ┆ 5     ┆ 0     ┆ 7         │
│ 2   ┆ 7     ┆ 1     ┆ 3     ┆ 11        │
└─────┴───────┴───────┴───────┴───────────┘
wolfie27k
  • 11
  • 1
  • 2
    [`pl.sum()`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.sum.html#polars.sum) has since been added if that's helpful - e.g. `df.with_columns(pl.sum(pl.exclude("id")))` – jqurious Jan 28 '23 at 16:16