Compute standard deviation for polars dataframe rows for set of columns

Question

I would like to calculate the standard deviation of dataframe row for the columns 'foo' and 'bar'.

I am able to find min,max and mean but not std.

import polars as pl

df = pl.DataFrame(

    {

        "foo": [1, 2, 3],

        "bar": [6, 7, 8],

        "ham": ["a", "b", "c"],

    }

)

#finding the sum works for me, the same code works for min and max as well.

df = df.select(pl.col('*'),\
        df.select(pl.col(['foo','bar']))\
            .sum(axis=1)\
            .apply(lambda x:round(x,2))\
            .alias('sum'))

however, the below code throws an error when trying to calculate the standard deviation as the std function does not have axis argument available.

df = df.select(pl.col('*'),\
        df.select(pl.col(['foo','bar']))\
            .std(axis=1)\
            .apply(lambda x:round(x,2))\
            .alias('std'))

Is there any better method available to compute standard deviation in such scenario ?

What should the output be for std? For sum you can do: `df.with_columns(sum = pl.sum(["foo", "bar"]).round(2))` — jqurious, Apr 19 '23 at 06:27
when you execute df.with_columns(stdev= pl.std(["foo", "bar"]).round(2)), it does not work. — Rakesh Chaudhary, Apr 19 '23 at 06:46
Yes, I meant can you show the actual final output dataframe (with values) you're trying to generate as it's not entirely clear from the question. — jqurious, Apr 19 '23 at 07:17

score 0 · Accepted Answer · answered Apr 19 '23 at 07:18

In polars, axis=1 is covered under: Row wise computations.

See also: https://stackoverflow.com/a/71951543

df = pl.from_repr("""
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ 6   ┆ a   │
│ 2   ┆ 7   ┆ b   │
│ 3   ┆ 8   ┆ c   │
│ 4   ┆ 9   ┆ a   │
└─────┴─────┴─────┘
""")

df.with_columns(
   sum = pl.concat_list("foo", "bar").arr.eval(pl.element().sum()).arr.first(),
   std = pl.concat_list("foo", "bar").arr.eval(pl.element().std()).arr.first()
)

shape: (4, 5)
┌─────┬─────┬─────┬─────┬──────────┐
│ foo ┆ bar ┆ ham ┆ sum ┆ std      │
│ --- ┆ --- ┆ --- ┆ --- ┆ ---      │
│ i64 ┆ i64 ┆ str ┆ i64 ┆ f64      │
╞═════╪═════╪═════╪═════╪══════════╡
│ 1   ┆ 6   ┆ a   ┆ 7   ┆ 3.535534 │
│ 2   ┆ 7   ┆ b   ┆ 9   ┆ 3.535534 │
│ 3   ┆ 8   ┆ c   ┆ 11  ┆ 3.535534 │
│ 4   ┆ 9   ┆ a   ┆ 13  ┆ 3.535534 │
└─────┴─────┴─────┴─────┴──────────┘

summing is also available via .arr.sum() and pl.sum()

pl.concat_list("foo", "bar").arr.sum()

pl.sum(["foo", "bar"])

Compute standard deviation for polars dataframe rows for set of columns

1 Answers1

Linked