For a dataframe, the goal is to have the mean of a column - a
groupby another column - b
given the first value of a
in the group is not null, if it is, just return null.
The sample dataframe
df = pl.DataFrame({"a": [None, 1, 2, 3, 4], "b": [1, 1, 2, 2, 2]})
I tried something like
df.groupby("b").agg(
pl.when(pl.col("a").first().is_null()).then(None).otherwise(pl.mean("a"))
)
The results are as expected but get a warning saying when
may not be guaranteed to do its job in groupby context.
>>> df.groupby("b").agg(pl.when(pl.col("a").first().is_null()).then(None).otherwise(pl.mean("a")))
The predicate 'col("a").first().is_null()' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
shape: (2, 2)
┌─────┬─────────┐
│ b ┆ literal │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════════╡
│ 1 ┆ null │
│ 2 ┆ 3.0 │
└─────┴─────────┘
May I know why and what could be a better alternative way to do if-else in groupby?