1

I have multi columns which name startswith "ts" like "ts_1, ts_2, ts_3,etc" , I want to sum these f64 value row by row, but I don't know exactly the column names. If I use regex like pl.col('^ts.*$'). How to sum these value?

Hakase
  • 211
  • 1
  • 12

1 Answers1

3

Update: Polars >= 0.13.60

As of Polars 0.13.60, you can now use polars.sum with a regex argument to polars.col.

df.with_column(
    pl.sum(pl.col("^ts_.*$")).alias('ts_sum')
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a    ┆ ts_2 ┆ b     ┆ ts_3 ┆ ts_sum │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---    │
│ i64  ┆ i64  ┆ i64  ┆ i64   ┆ i64  ┆ i64    │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 111    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 112    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 113    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 114    │
└──────┴──────┴──────┴───────┴──────┴────────┘

Other methods

There are two ways to accomplish this. Starting with this data:

import polars as pl

df = pl.DataFrame({
    'ts_1': [1, 2, 3, 4],
    'a': [-100] * 4,
    'ts_2': [10] * 4,
    'b': [-1000] * 4,
    'ts_3': [100] * 4,
})
df
shape: (4, 5)
┌──────┬──────┬──────┬───────┬──────┐
│ ts_1 ┆ a    ┆ ts_2 ┆ b     ┆ ts_3 │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ i64   ┆ i64  │
╞══════╪══════╪══════╪═══════╪══════╡
│ 1    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 4    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  │
└──────┴──────┴──────┴───────┴──────┘

As of Polars 0.13.59, you can use a regex expression in fold.

df.with_column(
    pl.fold(acc=pl.lit(0),
            f=lambda acc, x: acc + x,
            exprs=pl.col("^ts_.*$"))
    .alias("ts_sum")
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a    ┆ ts_2 ┆ b     ┆ ts_3 ┆ ts_sum │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---    │
│ i64  ┆ i64  ┆ i64  ┆ i64   ┆ i64  ┆ i64    │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 111    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 112    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 113    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 114    │
└──────┴──────┴──────┴───────┴──────┴────────┘

The other option is to use the fold-like properties of polars.sum. polars.sum will sum horizontally when passed a list of Expressions. For example:

df.with_column(
    pl.sum(
        [col_nm for col_nm in df.columns
         if col_nm.startswith(r"ts_")]
    ).alias("ts_sum")
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a    ┆ ts_2 ┆ b     ┆ ts_3 ┆ ts_sum │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---    │
│ i64  ┆ i64  ┆ i64  ┆ i64   ┆ i64  ┆ i64    │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 111    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 112    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 113    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 114    │
└──────┴──────┴──────┴───────┴──────┴────────┘

If needed, you can use regex-type filtering, using a regex module such as re.

import re
df.with_column(
    pl.sum(
        [col_nm for col_nm in df.columns
         if re.search(r"^ts_.*$", col_nm)]
    ).alias("ts_sum")
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a    ┆ ts_2 ┆ b     ┆ ts_3 ┆ ts_sum │
│ ---  ┆ ---  ┆ ---  ┆ ---   ┆ ---  ┆ ---    │
│ i64  ┆ i64  ┆ i64  ┆ i64   ┆ i64  ┆ i64    │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 111    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 112    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 113    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4    ┆ -100 ┆ 10   ┆ -1000 ┆ 100  ┆ 114    │
└──────┴──────┴──────┴───────┴──────┴────────┘
  • Polars 0.13.60 added a new, easy way to use a regex expression in `polars.sum`. I've update the answer to reflect this. –  Aug 04 '22 at 15:09