I have a polars LazyFrame
which has 3 columns of type nullable list[f64]
, something like this.
import polars as pl
lf = pl.DataFrame({
"1": [
[0.0, 1.1, 2.2],
[0.0, 1.1, 2.2],
[0.0, 1.1, 2.2],
None,
],
"2": [
[0.3, 1.3, 2.3],
[0.4, 1.4, 2.4],
[0.5, 1.5, 2.5],
None,
],
"3": [
[0.7, 1.7, 2.7],
None,
[0.9, 1.9, 2.9],
None,
],
}).lazy()
┌─────────────────┬─────────────────┬─────────────────┐
│ 1 ┆ 2 ┆ 3 │
│ --- ┆ --- ┆ --- │
│ list[f64] ┆ list[f64] ┆ list[f64] │
╞═════════════════╪═════════════════╪═════════════════╡
│ [0.0, 1.1, 2.2] ┆ [0.3, 1.3, 2.3] ┆ [0.7, 1.7, 2.7] │
│ [0.0, 1.1, 2.2] ┆ [0.4, 1.4, 2.4] ┆ null │
│ [0.0, 1.1, 2.2] ┆ [0.5, 1.5, 2.5] ┆ [0.9, 1.9, 2.9] │
│ null ┆ null ┆ null │
└─────────────────┴─────────────────┴─────────────────┘
I need to add a column with the average of the three column's lists, furthermore
- when in a row there are only
null
, then avg will be a zeros-filled list of fixed length3
- when one item is
null
the average will be computed on the non-null lists
For "average of the lists" I mean the element-wise sum divided by the number of lists involved in the sum.
So in the first row I want:
[
0.0 + 0.3 + 0.7,
1.1 + 1.3 + 1.7,
2.2 + 2.3 + 2.7
] / 3
=
[
1.0, 4.1, 7.2
] / 3
=
[0.33, 1.36, 2.40]
In the second row: [0.0 + 0.4, 1.1 + 1.4, 2.2 + 2.4] / 2 = [0.4, 2.8, 4.6] / 2 = [0.2, 1.4, 2.3]
.
In the last row: [0.0, 0.0, 0.0]
.
I found a way to sum the columns
lf.select(
pl.sum_horizontal(
pl.col("*").list.explode()
).reshape((1, -1)).alias("sum"),
).collect()
But this only works when all the items in row are non-null.