0

I have a df as follows:

enter image description here

with n knows at runtime.

I need to count 1 and -1 values over the rows.

Namely, I need a new df (or new columns in the old one):

enter image description here Any advice?

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
Sigi
  • 53
  • 8
  • I solved with an example of the guide but I don't understand how it works: data_frame = data_frame.select( pl.fold(acc=pl.lit(0), f=lambda acc, x: acc + x, exprs=pl.col("*") > 0).alias("sum"), ) – Sigi Aug 05 '22 at 08:18

1 Answers1

1

As of polars 0.13.60, you can use polars.sum with an Expression to sum horizontally. For example, starting with this data

import polars as pl

data_frame = (
    pl.DataFrame({
        'col0': [1, -1, 1, -1, 1],
        'col1': [1, 1, 1, 1, 1],
        'col2': [-1, -1, -1, -1, -1],
        'col3': [1, -1, -1, 1, 1],
    })
)
data_frame
shape: (5, 4)
┌──────┬──────┬──────┬──────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╪══════╡
│ 1    ┆ 1    ┆ -1   ┆ 1    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ -1   ┆ 1    ┆ -1   ┆ -1   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1    ┆ 1    ┆ -1   ┆ -1   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ -1   ┆ 1    ┆ -1   ┆ 1    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1    ┆ 1    ┆ -1   ┆ 1    │
└──────┴──────┴──────┴──────┘

We can sum all columns horizontally, using polars.all.

(
    data_frame
    .with_columns([
        pl.sum(pl.all() > 0).alias('pos'),
        pl.sum(pl.all() < 0).alias('neg'),
    ])
)
shape: (5, 6)
┌──────┬──────┬──────┬──────┬─────┬─────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 ┆ pos ┆ neg │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ --- ┆ --- │
│ i64  ┆ i64  ┆ i64  ┆ i64  ┆ i64 ┆ i64 │
╞══════╪══════╪══════╪══════╪═════╪═════╡
│ 1    ┆ 1    ┆ -1   ┆ 1    ┆ 3   ┆ 1   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ -1   ┆ 1    ┆ -1   ┆ -1   ┆ 1   ┆ 3   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1    ┆ 1    ┆ -1   ┆ -1   ┆ 2   ┆ 2   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ -1   ┆ 1    ┆ -1   ┆ 1    ┆ 2   ┆ 2   │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1    ┆ 1    ┆ -1   ┆ 1    ┆ 3   ┆ 1   │
└──────┴──────┴──────┴──────┴─────┴─────┘

How it works

The above algorithm works because Polars will upcast boolean values to unsigned integers when summing. For example, the expression pl.all() > 0 produces Expressions of type boolean.

(
    data_frame
    .with_columns([
        (pl.all() > 0).keep_name()
    ])
)
shape: (5, 4)
┌───────┬──────┬───────┬───────┐
│ col0  ┆ col1 ┆ col2  ┆ col3  │
│ ---   ┆ ---  ┆ ---   ┆ ---   │
│ bool  ┆ bool ┆ bool  ┆ bool  │
╞═══════╪══════╪═══════╪═══════╡
│ true  ┆ true ┆ false ┆ true  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ false ┆ true ┆ false ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true  ┆ true ┆ false ┆ false │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ false ┆ true ┆ false ┆ true  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ true  ┆ true ┆ false ┆ true  │
└───────┴──────┴───────┴───────┘

polars.sum will then convert these to unsigned integers as it sums them horizontally.

For examples of how to select only certain columns (by name, by type, by regex expression, etc...), see this StackOverflow response.

  • When I use your example I keep getting "argument 'name': 'Expr' object cannot be converted to 'PyString' " error. I believe it has something to do with the pl.sum? –  Aug 05 '22 at 16:58
  • "As of polars 0.13.60, you can use polars.sum with an Expression to sum horizontally." Have you upgraded to Polars version 0.13.60 or above? –  Aug 05 '22 at 17:09
  • forgive my ignorance but when I do pip install polars, doesn't it come with the newest version of polars? –  Aug 05 '22 at 17:11
  • Hmmm, you may have to try `pip install -U polars`. If you previously had polars installed, the `-U` flag will update the installation to the latest version. (Otherwise, it won't update the version.) You can check your current versions of packages with `pip list`. –  Aug 05 '22 at 17:16
  • I'm still getting the same error. strange... I'm on polars 0.13.61 –  Aug 05 '22 at 18:02
  • 1
    Ok, I just loaded 0.13.59, and I got the exact same error message you received. So, somehow your environment isn’t seeing the update. Do you need to restart your interpreter or your IDE to load the new version of the Polars package? If you type `pl.__version__`, you can see the version that your IDE/interpreter is using. –  Aug 05 '22 at 18:13