Row-wise computations:
You can create a list and access the .arr
namespace for row wise computations.
@ritchie46's answer regarding rank(axis=1)
is also useful reading.
.arr.eval()
can be used for more complex computations.
df = pl.DataFrame([[1, 2, 3], [4, 5, 3], [1, 8, 9]])
(df.with_column(pl.concat_list(pl.all()).alias("row"))
.with_columns([
pl.col("row").arr.sum().alias("sum"),
pl.col("row").arr.mean().alias("mean"),
pl.col("row").arr.eval(pl.all().median(), parallel=True).alias("median"),
pl.col("row").arr.eval(pl.all().rank(), parallel=True).alias("rank"),
])
)
shape: (3, 8)
┌──────────┬──────────┬──────────┬───────────┬─────┬──────┬───────────┬─────────────────┐
│ column_0 | column_1 | column_2 | row | sum | mean | median | rank │
│ --- | --- | --- | --- | --- | --- | --- | --- │
│ i64 | i64 | i64 | list[i64] | i64 | f64 | list[f64] | list[f32] │
╞══════════╪══════════╪══════════╪═══════════╪═════╪══════╪═══════════╪═════════════════╡
│ 1 | 4 | 1 | [1, 4, 1] | 6 | 2.0 | [1.0] | [1.5, 3.0, 1.5] │
├──────────┼──────────┼──────────┼───────────┼─────┼──────┼───────────┼─────────────────┤
│ 2 | 5 | 8 | [2, 5, 8] | 15 | 5.0 | [5.0] | [1.0, 2.0, 3.0] │
├──────────┼──────────┼──────────┼───────────┼─────┼──────┼───────────┼─────────────────┤
│ 3 | 3 | 9 | [3, 3, 9] | 15 | 5.0 | [3.0] | [1.5, 1.5, 3.0] │
└──────────┴──────────┴──────────┴───────────┴─────┴──────┴───────────┴─────────────────┘
Can be given a list of columns.
>>> df.select(pl.sum(pl.all()))
shape: (3, 1)
┌─────┐
│ sum │
│ --- │
│ i64 │
╞═════╡
│ 6 │
├─────┤
│ 15 │
├─────┤
│ 15 │
└─────┘
Can be accessed inside .arr.eval()
pdf = df.to_pandas()
pdf[pdf.rolling(2, axis=1).mean() > 3]
column_0 column_1 column_2
0 NaN NaN NaN
1 NaN 5.0 8.0
2 NaN NaN 9.0
(df.with_column(pl.concat_list(pl.all()).alias("row"))
.with_column(
pl.col("row").arr.eval(
pl.when(pl.all().rolling_mean(2) > 3)
.then(pl.all()),
parallel=True)
.alias("rolling[mean] > 3"))
)
shape: (3, 5)
┌──────────┬──────────┬──────────┬───────────┬────────────────────┐
│ column_0 | column_1 | column_2 | row | rolling[mean] > 3 │
│ --- | --- | --- | --- | --- │
│ i64 | i64 | i64 | list[i64] | list[i64] │
╞══════════╪══════════╪══════════╪═══════════╪════════════════════╡
│ 1 | 4 | 1 | [1, 4, 1] | [null, null, null] │
├──────────┼──────────┼──────────┼───────────┼────────────────────┤
│ 2 | 5 | 8 | [2, 5, 8] | [null, 5, 8] │
├──────────┼──────────┼──────────┼───────────┼────────────────────┤
│ 3 | 3 | 9 | [3, 3, 9] | [null, null, 9] │
└──────────┴──────────┴──────────┴───────────┴────────────────────┘
If you want to "expand" the lists into columns:
- Turn the list into a struct with
.arr.to_struct()
.unnest()
the struct.
- Rename the columns (if needed)
(df.with_column(pl.concat_list(pl.all()).alias("row"))
.select(
pl.col("row").arr.eval(
pl.when(pl.all().rolling_mean(2) > 3)
.then(pl.all()),
parallel=True)
.arr.to_struct()
.alias("rolling[mean]"))
.unnest("rolling[mean]")
)
shape: (3, 3)
┌─────────┬─────────┬─────────┐
│ field_0 | field_1 | field_2 │
│ --- | --- | --- │
│ i64 | i64 | i64 │
╞═════════╪═════════╪═════════╡
│ null | null | null │
├─────────┼─────────┼─────────┤
│ null | 5 | 8 │
├─────────┼─────────┼─────────┤
│ null | null | 9 │
└─────────┴─────────┴─────────┘
.transpose()
You could always transpose the dataframe to switch the axis and use the "regular" api.
(df.transpose()
.select(
pl.when(pl.all().rolling_mean(2) > 3)
.then(pl.all())
.keep_name())
.transpose())
shape: (3, 3)
┌──────────┬──────────┬──────────┐
│ column_0 | column_1 | column_2 │
│ --- | --- | --- │
│ i64 | i64 | i64 │
╞══════════╪══════════╪══════════╡
│ null | null | null │
├──────────┼──────────┼──────────┤
│ null | 5 | 8 │
├──────────┼──────────┼──────────┤
│ null | null | 9 │
└──────────┴──────────┴──────────┘