3

Apparently this is a more complicated problem than I thought. All I want to do is fill the zeros with ++1 increments until the next 1

My dataset is 1m+ rows, so I'm trying to vectorize this operation if possible.

Here's a sample column:

# Define the input dataframe
df = pd.DataFrame({'col': [1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0]})

0     1
1     0
2     1
3     0
4     1
5     1
6     0
7     0
8     0
9     0
10    1
11    0
12    1
13    1
14    0

Goal Result:

0     1
1     2
2     1
3     2
4     1
5     1
6     2
7     3
8     4
9     5
10    1
11    2
12    1
13    1
14    2

I've tried a number of different methods with ffill() and cumsum(), but the issue with cumsum() tends to be that it doesn't reset the increment.

nborel
  • 33
  • 4

2 Answers2

2

Group by cumulative sums of column col and apply cumcount:

df['col'] = df.groupby(df['col'].cumsum())['col'].cumcount() + 1

    col
0     1
1     2
2     1
3     2
4     1
5     1
6     2
7     3
8     4
9     5
10    1
11    2
12    1
13    1
14    2
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

Replace temporary 0 by 1 then create groups for each real 1 and consecutive 0 then apply cumulative sum for the group:

df['col2'] = df['col'].replace(0, 1).groupby(df['col'].cumsum()).cumsum()
print(df)

# Output
    col  col2
0     1     1
1     0     2
2     1     1
3     0     2
4     1     1
5     1     1
6     0     2
7     0     3
8     0     4
9     0     5
10    1     1
11    0     2
12    1     1
13    1     1
14    0     2
L0tad
  • 574
  • 3
  • 15
Corralien
  • 109,409
  • 8
  • 28
  • 52