This is effectively a translation of this pandas answer, and nearly does what you want (simplified, courtesy of @jqurious):
df = pl.DataFrame({"num": [0, 29, 28, 4, 0, 0, 13, 0]})
df = df.with_columns(
cumcount=pl.col("num")
.cumcount()
.over(pl.col("num").cumsum()),
)
Output:
>>> df
shape: (8, 2)
┌─────┬──────────┐
│ num ┆ cumcount │
│ --- ┆ --- │
│ i64 ┆ u32 │
╞═════╪══════════╡
│ 0 ┆ 0 │
│ 29 ┆ 0 │
│ 28 ┆ 0 │
│ 4 ┆ 0 │
│ 0 ┆ 1 │
│ 0 ┆ 2 │
│ 13 ┆ 0 │
│ 0 ┆ 1 │
└─────┴──────────┘
The only "inaccuracy" here is that the first 0
starts with a cumcount
of 0.
To "remedy" this, one way is to temporarily prepend a copy of the first row before applying the procedure above:
df = (
pl.concat([df.head(1), df])
.with_columns(
cumcount=pl.col("num")
.cumcount()
.over(pl.col("num").cumsum()),
)
.tail(-1)
)
Output:
>>> df
shape: (8, 2)
┌─────┬──────────┐
│ num ┆ cumcount │
│ --- ┆ --- │
│ i64 ┆ u32 │
╞═════╪══════════╡
│ 0 ┆ 1 │
│ 29 ┆ 0 │
│ 28 ┆ 0 │
│ 4 ┆ 0 │
│ 0 ┆ 1 │
│ 0 ┆ 2 │
│ 13 ┆ 0 │
│ 0 ┆ 1 │
└─────┴──────────┘