3

I am dealing with a time-series dataframe that looks like this except with more than thousands of rows. I want to make a new column that enumerates the blocks of rows with the same values of 'sign'. i.e. 0th row would be 0, 1st row to 23rd row would be 1, 24th row to 30th row would be 2 etc...(the chronological order is important) What's the most pythonic way to accomplish this? Thank you in advance

    Date       sign
0   2011-01-27  1
1   2011-01-28  -1
2   2011-01-31  -1
3   2011-02-01  -1
4   2011-02-02  -1
5   2011-02-07  -1
6   2011-02-08  -1
7   2011-02-09  -1
8   2011-02-10  -1
9   2011-02-11  -1
10  2011-02-14  -1
11  2011-02-15  -1
12  2011-02-16  -1
13  2011-02-17  -1
14  2011-02-18  -1
15  2011-02-21  -1
16  2011-02-22  -1
17  2011-02-23  -1
18  2011-02-24  -1
19  2011-02-25  -1
20  2011-02-28  -1
21  2011-03-01  -1
22  2011-03-02  -1
23  2011-03-03  -1
24  2011-03-04  1
25  2011-03-07  1
26  2011-03-08  1
27  2011-03-09  1
28  2011-03-10  1
29  2011-03-11  1
30  2011-03-14  1
31  2011-03-15  -1
32  2011-03-16  -1
33  2011-03-17  -1
34  2011-03-18  -1
35  2011-03-21  -1
36  2011-03-22  -1
37  2011-03-23  -1
38  2011-03-24  -1
39  2011-03-25  -1
40  2011-03-28  -1
41  2011-03-29  1
42  2011-03-30  1
mathguy
  • 1,450
  • 1
  • 16
  • 33

2 Answers2

2

You can get the cumsum of where the sign changes, obtained using diff:

df['new_column'] = (df.sign.diff()!=0).cumsum()-1

>>> df
          Date  sign  new_column
0   2011-01-27     1      0
1   2011-01-28    -1      1
2   2011-01-31    -1      1
3   2011-02-01    -1      1
4   2011-02-02    -1      1
5   2011-02-07    -1      1
6   2011-02-08    -1      1
7   2011-02-09    -1      1
8   2011-02-10    -1      1
9   2011-02-11    -1      1
10  2011-02-14    -1      1
11  2011-02-15    -1      1
12  2011-02-16    -1      1
13  2011-02-17    -1      1
14  2011-02-18    -1      1
15  2011-02-21    -1      1
16  2011-02-22    -1      1
17  2011-02-23    -1      1
18  2011-02-24    -1      1
19  2011-02-25    -1      1
20  2011-02-28    -1      1
21  2011-03-01    -1      1
22  2011-03-02    -1      1
23  2011-03-03    -1      1
24  2011-03-04     1      2
25  2011-03-07     1      2
26  2011-03-08     1      2
27  2011-03-09     1      2
28  2011-03-10     1      2
29  2011-03-11     1      2
30  2011-03-14     1      2
31  2011-03-15    -1      3
32  2011-03-16    -1      3
33  2011-03-17    -1      3
34  2011-03-18    -1      3
35  2011-03-21    -1      3
36  2011-03-22    -1      3
37  2011-03-23    -1      3
38  2011-03-24    -1      3
39  2011-03-25    -1      3
40  2011-03-28    -1      3
41  2011-03-29     1      4
42  2011-03-30     1      4
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • Thank you @sacul. It seems to work. I wonder what works on the same problem with non 1,-1 columns though. – mathguy Dec 11 '18 at 16:45
  • Yes, shouldn't be a problem! It will always update the count whenever the sign changes value, regardless of whether it's from -1 to 1 or -9999 to 9999 (or whatever numeric values) – sacuL Dec 11 '18 at 16:46
1

You could do:

df['count'] = df.sign.ne(df.sign.shift(1)).cumsum()

  Date  sign  count
0   2011-01-27     1      1
1   2011-01-28    -1      2
2   2011-01-31    -1      2
3   2011-02-01    -1      2
4   2011-02-02    -1      2
5   2011-02-07    -1      2
.
.
.
23  2011-03-03    -1      2
24  2011-03-04     1      3
25  2011-03-07     1      3
26  2011-03-08     1      3
27  2011-03-09     1      3
yatu
  • 86,083
  • 12
  • 84
  • 139