Python Polars: Number of rows since last value >0

Question

Given a polars DataFrame column like

[0, 29, 28, 4, 0, 0, 13, 0]

how to get a new column like

[1, 0, 0, 0, 1, 2, 0, 1]

The solution should preferably work with .over() for grouped values and optionally an additional rolling window function like rolling_mean().

I know of the respective question for pandas but couldn't manage to translate it.

score 2 · Answer 1 · answered Aug 27 '23 at 21:43

Here's one way with rle_id to identify the groups to project over, and only doing so on the 0 groups with a when/then:

df = pl.from_dict({'a': [0, 29, 0, 28, 4, 0, 0, 0, 13, 0, 0, 46, 47, 0]})

df.with_columns(
    b=pl.when(pl.col('a') == 0)
    .then(1 + pl.col('a').cumcount().over(pl.col('a').ne(0).rle_id()))
    .otherwise(0)
)

shape: (14, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ u32 │
╞═════╪═════╡
│ 0   ┆ 1   │
│ 29  ┆ 0   │
│ 0   ┆ 1   │
│ 28  ┆ 0   │
│ 4   ┆ 0   │
│ 0   ┆ 1   │
│ 0   ┆ 2   │
│ 0   ┆ 3   │
│ 13  ┆ 0   │
│ 0   ┆ 1   │
│ 0   ┆ 2   │
│ 46  ┆ 0   │
│ 47  ┆ 0   │
│ 0   ┆ 1   │
└─────┴─────┘

Mateen Ulhaq · Answer 2 · 2023-08-28T00:02:35.820

This is effectively a translation of this pandas answer, and nearly does what you want (simplified, courtesy of @jqurious):

df = pl.DataFrame({"num": [0, 29, 28, 4, 0, 0, 13, 0]})

df = df.with_columns(
    cumcount=pl.col("num")
    .cumcount()
    .over(pl.col("num").cumsum()),
)

Output:

>>> df
shape: (8, 2)
┌─────┬──────────┐
│ num ┆ cumcount │
│ --- ┆ ---      │
│ i64 ┆ u32      │
╞═════╪══════════╡
│ 0   ┆ 0        │
│ 29  ┆ 0        │
│ 28  ┆ 0        │
│ 4   ┆ 0        │
│ 0   ┆ 1        │
│ 0   ┆ 2        │
│ 13  ┆ 0        │
│ 0   ┆ 1        │
└─────┴──────────┘

The only "inaccuracy" here is that the first 0 starts with a cumcount of 0.

To "remedy" this, one way is to temporarily prepend a copy of the first row before applying the procedure above:

df = (
    pl.concat([df.head(1), df])
    .with_columns(
        cumcount=pl.col("num")
        .cumcount()
        .over(pl.col("num").cumsum()),
    )
    .tail(-1)
)

Output:

>>> df
shape: (8, 2)
┌─────┬──────────┐
│ num ┆ cumcount │
│ --- ┆ ---      │
│ i64 ┆ u32      │
╞═════╪══════════╡
│ 0   ┆ 1        │
│ 29  ┆ 0        │
│ 28  ┆ 0        │
│ 4   ┆ 0        │
│ 0   ┆ 1        │
│ 0   ┆ 2        │
│ 13  ┆ 0        │
│ 0   ┆ 1        │
└─────┴──────────┘

If it helps, you can write it without apply: `df.with_columns(cumcount = pl.col("num").cumcount().over(pl.col("num").cumsum()))` — jqurious, Aug 27 '23 at 22:52

Python Polars: Number of rows since last value >0

2 Answers2