2

Given a polars DataFrame column like

[0, 29, 28, 4, 0, 0, 13, 0]

how to get a new column like

[1, 0, 0, 0, 1, 2, 0, 1]

The solution should preferably work with .over() for grouped values and optionally an additional rolling window function like rolling_mean().

I know of the respective question for pandas but couldn't manage to translate it.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
OliverHennhoefer
  • 677
  • 2
  • 8
  • 21

2 Answers2

2

Here's one way with rle_id to identify the groups to project over, and only doing so on the 0 groups with a when/then:

df = pl.from_dict({'a': [0, 29, 0, 28, 4, 0, 0, 0, 13, 0, 0, 46, 47, 0]})

df.with_columns(
    b=pl.when(pl.col('a') == 0)
    .then(1 + pl.col('a').cumcount().over(pl.col('a').ne(0).rle_id()))
    .otherwise(0)
)
shape: (14, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ u32 │
╞═════╪═════╡
│ 0   ┆ 1   │
│ 29  ┆ 0   │
│ 0   ┆ 1   │
│ 28  ┆ 0   │
│ 4   ┆ 0   │
│ 0   ┆ 1   │
│ 0   ┆ 2   │
│ 0   ┆ 3   │
│ 13  ┆ 0   │
│ 0   ┆ 1   │
│ 0   ┆ 2   │
│ 46  ┆ 0   │
│ 47  ┆ 0   │
│ 0   ┆ 1   │
└─────┴─────┘
Wayoshi
  • 1,688
  • 1
  • 7
0

This is effectively a translation of this pandas answer, and nearly does what you want (simplified, courtesy of @jqurious):

df = pl.DataFrame({"num": [0, 29, 28, 4, 0, 0, 13, 0]})

df = df.with_columns(
    cumcount=pl.col("num")
    .cumcount()
    .over(pl.col("num").cumsum()),
)

Output:

>>> df
shape: (8, 2)
┌─────┬──────────┐
│ num ┆ cumcount │
│ --- ┆ ---      │
│ i64 ┆ u32      │
╞═════╪══════════╡
│ 0   ┆ 0        │
│ 29  ┆ 0        │
│ 28  ┆ 0        │
│ 4   ┆ 0        │
│ 0   ┆ 1        │
│ 0   ┆ 2        │
│ 13  ┆ 0        │
│ 0   ┆ 1        │
└─────┴──────────┘

The only "inaccuracy" here is that the first 0 starts with a cumcount of 0.


To "remedy" this, one way is to temporarily prepend a copy of the first row before applying the procedure above:

df = (
    pl.concat([df.head(1), df])
    .with_columns(
        cumcount=pl.col("num")
        .cumcount()
        .over(pl.col("num").cumsum()),
    )
    .tail(-1)
)

Output:

>>> df
shape: (8, 2)
┌─────┬──────────┐
│ num ┆ cumcount │
│ --- ┆ ---      │
│ i64 ┆ u32      │
╞═════╪══════════╡
│ 0   ┆ 1        │
│ 29  ┆ 0        │
│ 28  ┆ 0        │
│ 4   ┆ 0        │
│ 0   ┆ 1        │
│ 0   ┆ 2        │
│ 13  ┆ 0        │
│ 0   ┆ 1        │
└─────┴──────────┘
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
  • 2
    If it helps, you can write it without apply: `df.with_columns(cumcount = pl.col("num").cumcount().over(pl.col("num").cumsum()))` – jqurious Aug 27 '23 at 22:52