I have a df as follows:
with n knows at runtime.
I need to count 1 and -1 values over the rows.
Namely, I need a new df (or new columns in the old one):
I have a df as follows:
with n knows at runtime.
I need to count 1 and -1 values over the rows.
Namely, I need a new df (or new columns in the old one):
As of polars 0.13.60, you can use polars.sum
with an Expression to sum horizontally. For example, starting with this data
import polars as pl
data_frame = (
pl.DataFrame({
'col0': [1, -1, 1, -1, 1],
'col1': [1, 1, 1, 1, 1],
'col2': [-1, -1, -1, -1, -1],
'col3': [1, -1, -1, 1, 1],
})
)
data_frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪══════╡
│ 1 ┆ 1 ┆ -1 ┆ 1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ -1 ┆ 1 ┆ -1 ┆ -1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ 1 ┆ -1 ┆ -1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ -1 ┆ 1 ┆ -1 ┆ 1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ 1 ┆ -1 ┆ 1 │
└──────┴──────┴──────┴──────┘
We can sum all columns horizontally, using polars.all
.
(
data_frame
.with_columns([
pl.sum(pl.all() > 0).alias('pos'),
pl.sum(pl.all() < 0).alias('neg'),
])
)
shape: (5, 6)
┌──────┬──────┬──────┬──────┬─────┬─────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 ┆ pos ┆ neg │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╪═════╡
│ 1 ┆ 1 ┆ -1 ┆ 1 ┆ 3 ┆ 1 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ -1 ┆ 1 ┆ -1 ┆ -1 ┆ 1 ┆ 3 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 1 ┆ -1 ┆ -1 ┆ 2 ┆ 2 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ -1 ┆ 1 ┆ -1 ┆ 1 ┆ 2 ┆ 2 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 1 ┆ -1 ┆ 1 ┆ 3 ┆ 1 │
└──────┴──────┴──────┴──────┴─────┴─────┘
The above algorithm works because Polars will upcast boolean values to unsigned integers when summing. For example, the expression pl.all() > 0
produces Expressions of type boolean.
(
data_frame
.with_columns([
(pl.all() > 0).keep_name()
])
)
shape: (5, 4)
┌───────┬──────┬───────┬───────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 │
│ --- ┆ --- ┆ --- ┆ --- │
│ bool ┆ bool ┆ bool ┆ bool │
╞═══════╪══════╪═══════╪═══════╡
│ true ┆ true ┆ false ┆ true │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ false ┆ true ┆ false ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ true ┆ false ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ false ┆ true ┆ false ┆ true │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true ┆ true ┆ false ┆ true │
└───────┴──────┴───────┴───────┘
polars.sum
will then convert these to unsigned integers as it sums them horizontally.
For examples of how to select only certain columns (by name, by type, by regex expression, etc...), see this StackOverflow response.