Enumerating dataframe based on a column

Question

I am dealing with a time-series dataframe that looks like this except with more than thousands of rows. I want to make a new column that enumerates the blocks of rows with the same values of 'sign'. i.e. 0th row would be 0, 1st row to 23rd row would be 1, 24th row to 30th row would be 2 etc...(the chronological order is important) What's the most pythonic way to accomplish this? Thank you in advance

    Date       sign
0   2011-01-27  1
1   2011-01-28  -1
2   2011-01-31  -1
3   2011-02-01  -1
4   2011-02-02  -1
5   2011-02-07  -1
6   2011-02-08  -1
7   2011-02-09  -1
8   2011-02-10  -1
9   2011-02-11  -1
10  2011-02-14  -1
11  2011-02-15  -1
12  2011-02-16  -1
13  2011-02-17  -1
14  2011-02-18  -1
15  2011-02-21  -1
16  2011-02-22  -1
17  2011-02-23  -1
18  2011-02-24  -1
19  2011-02-25  -1
20  2011-02-28  -1
21  2011-03-01  -1
22  2011-03-02  -1
23  2011-03-03  -1
24  2011-03-04  1
25  2011-03-07  1
26  2011-03-08  1
27  2011-03-09  1
28  2011-03-10  1
29  2011-03-11  1
30  2011-03-14  1
31  2011-03-15  -1
32  2011-03-16  -1
33  2011-03-17  -1
34  2011-03-18  -1
35  2011-03-21  -1
36  2011-03-22  -1
37  2011-03-23  -1
38  2011-03-24  -1
39  2011-03-25  -1
40  2011-03-28  -1
41  2011-03-29  1
42  2011-03-30  1

score 2 · Accepted Answer · answered Dec 11 '18 at 16:41

You can get the cumsum of where the sign changes, obtained using diff:

df['new_column'] = (df.sign.diff()!=0).cumsum()-1

>>> df
          Date  sign  new_column
0   2011-01-27     1      0
1   2011-01-28    -1      1
2   2011-01-31    -1      1
3   2011-02-01    -1      1
4   2011-02-02    -1      1
5   2011-02-07    -1      1
6   2011-02-08    -1      1
7   2011-02-09    -1      1
8   2011-02-10    -1      1
9   2011-02-11    -1      1
10  2011-02-14    -1      1
11  2011-02-15    -1      1
12  2011-02-16    -1      1
13  2011-02-17    -1      1
14  2011-02-18    -1      1
15  2011-02-21    -1      1
16  2011-02-22    -1      1
17  2011-02-23    -1      1
18  2011-02-24    -1      1
19  2011-02-25    -1      1
20  2011-02-28    -1      1
21  2011-03-01    -1      1
22  2011-03-02    -1      1
23  2011-03-03    -1      1
24  2011-03-04     1      2
25  2011-03-07     1      2
26  2011-03-08     1      2
27  2011-03-09     1      2
28  2011-03-10     1      2
29  2011-03-11     1      2
30  2011-03-14     1      2
31  2011-03-15    -1      3
32  2011-03-16    -1      3
33  2011-03-17    -1      3
34  2011-03-18    -1      3
35  2011-03-21    -1      3
36  2011-03-22    -1      3
37  2011-03-23    -1      3
38  2011-03-24    -1      3
39  2011-03-25    -1      3
40  2011-03-28    -1      3
41  2011-03-29     1      4
42  2011-03-30     1      4

Thank you @sacul. It seems to work. I wonder what works on the same problem with non 1,-1 columns though. — mathguy, Dec 11 '18 at 16:45
Yes, shouldn't be a problem! It will always update the count whenever the sign changes value, regardless of whether it's from -1 to 1 or -9999 to 9999 (or whatever numeric values) — sacuL, Dec 11 '18 at 16:46

yatu · Answer 2 · 2018-12-11T16:45:32.270

1

You could do:

df['count'] = df.sign.ne(df.sign.shift(1)).cumsum()

  Date  sign  count
0   2011-01-27     1      1
1   2011-01-28    -1      2
2   2011-01-31    -1      2
3   2011-02-01    -1      2
4   2011-02-02    -1      2
5   2011-02-07    -1      2
.
.
.
23  2011-03-03    -1      2
24  2011-03-04     1      3
25  2011-03-07     1      3
26  2011-03-08     1      3
27  2011-03-09     1      3

edited Dec 11 '18 at 16:45

answered Dec 11 '18 at 16:39

yatu

86,083
12
84
139

Thank you @nixon. – mathguy Dec 11 '18 at 16:46

Enumerating dataframe based on a column

2 Answers2