6

Let's say I have a Pandas series like so:

import pandas as pd

pd.Series([1, 0, 0, 1, 0, 0, 0], name='series')

How would I add a column with a row count since the last >0 number, like so:

pd.DataFrame({
    'series': [1, 0, 0, 1, 0, 0, 0],
    'row_num': [0, 1, 2, 0, 1, 2, 3]
})
Chris C
  • 599
  • 2
  • 8
  • 19

2 Answers2

8

Try this:

s.groupby(s.cumsum()).cumcount()

Output:

0    0
1    1
2    2
3    0
4    1
5    2
6    3
dtype: int64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • And to get this result in the desired format: `s.to_frame('series').join(s.groupby(s.cumsum()).cumcount().to_frame('row_num'))` – Alexander Jul 08 '19 at 22:25
  • @chrisc Would you consider [accepting](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work?answertab=votes#tab-top) this solution? – Scott Boston Jul 10 '19 at 18:28
1

Numpy

  • Find the places where the series/array is greater than 0
  • Calculate the differences from one place to the next
  • Subtract those values from a sequence

i = np.flatnonzero(s)
n = len(s)
delta = np.diff(np.append(i, n))
r = np.arange(n)
r - r[i].repeat(delta)

array([0, 1, 2, 0, 1, 2, 3])
piRSquared
  • 285,575
  • 57
  • 475
  • 624