I have multi columns which name startswith "ts" like "ts_1, ts_2, ts_3,etc" , I want to sum these f64 value row by row, but I don't know exactly the column names. If I use regex like pl.col('^ts.*$')
. How to sum these value?
Asked
Active
Viewed 895 times
1

Hakase
- 211
- 1
- 12
1 Answers
3
Update: Polars >= 0.13.60
As of Polars 0.13.60, you can now use polars.sum
with a regex argument to polars.col
.
df.with_column(
pl.sum(pl.col("^ts_.*$")).alias('ts_sum')
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a ┆ ts_2 ┆ b ┆ ts_3 ┆ ts_sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 111 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 112 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 113 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 114 │
└──────┴──────┴──────┴───────┴──────┴────────┘
Other methods
There are two ways to accomplish this. Starting with this data:
import polars as pl
df = pl.DataFrame({
'ts_1': [1, 2, 3, 4],
'a': [-100] * 4,
'ts_2': [10] * 4,
'b': [-1000] * 4,
'ts_3': [100] * 4,
})
df
shape: (4, 5)
┌──────┬──────┬──────┬───────┬──────┐
│ ts_1 ┆ a ┆ ts_2 ┆ b ┆ ts_3 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═══════╪══════╡
│ 1 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 4 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 │
└──────┴──────┴──────┴───────┴──────┘
As of Polars 0.13.59, you can use a regex expression in fold
.
df.with_column(
pl.fold(acc=pl.lit(0),
f=lambda acc, x: acc + x,
exprs=pl.col("^ts_.*$"))
.alias("ts_sum")
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a ┆ ts_2 ┆ b ┆ ts_3 ┆ ts_sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 111 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 112 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 113 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 114 │
└──────┴──────┴──────┴───────┴──────┴────────┘
The other option is to use the fold-like properties of polars.sum
. polars.sum
will sum horizontally when passed a list of Expressions. For example:
df.with_column(
pl.sum(
[col_nm for col_nm in df.columns
if col_nm.startswith(r"ts_")]
).alias("ts_sum")
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a ┆ ts_2 ┆ b ┆ ts_3 ┆ ts_sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 111 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 112 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 113 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 114 │
└──────┴──────┴──────┴───────┴──────┴────────┘
If needed, you can use regex-type filtering, using a regex module such as re
.
import re
df.with_column(
pl.sum(
[col_nm for col_nm in df.columns
if re.search(r"^ts_.*$", col_nm)]
).alias("ts_sum")
)
shape: (4, 6)
┌──────┬──────┬──────┬───────┬──────┬────────┐
│ ts_1 ┆ a ┆ ts_2 ┆ b ┆ ts_3 ┆ ts_sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪═══════╪══════╪════════╡
│ 1 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 111 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 112 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 3 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 113 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4 ┆ -100 ┆ 10 ┆ -1000 ┆ 100 ┆ 114 │
└──────┴──────┴──────┴───────┴──────┴────────┘
-
Polars 0.13.60 added a new, easy way to use a regex expression in `polars.sum`. I've update the answer to reflect this. – Aug 04 '22 at 15:09