1

I have a huge dataframe with multiple columns as below . I am trying to find cummulative sum of time when value in b is -1 which is preceeded by 1. I have tried cummulative sum but it's not working.

I am looking for cumulative sum of values only when the values in b are -1 for that group alone. Once the value in b changes to 0 I want cummulative sum to reset to 0

My cummulative sum values should match with the values in cummulative_sum_column

**date time**    **id**  **value**  **time**           **a****b**   **cummulative sum**
11/1/2021 0:05   v01111     4           0               0     0
11/1/2021 1:30   v01111     11      57.302              1     1 
11/1/2021 4:19   v01111     1       10129.88            0    -1 
11/1/2021 4:19   v01111     1       0.49                0    -1        10130.37
11/1/2021 9:30   v01111     2       0.202               0     0 
11/1/2021 9:31   v01111     11      58.699              1     1 
11/1/2021 10:31  v01111     1       3573.728            0    -1 
11/1/2021 10:31  v01111     1       0.29                0    -1       3574.018
11/2/2021 21:15  v01111     2       0.2                 0     0 
12/9/2021 20:17  v01112     11      58.525              1    0  
12/9/2021 20:53  v01113     11      2145.745            0    -1 
12/9/2021 20:53  v01114     1       0.016               0    -1           2146
12/9/2021 22:05  v01150     0       0.01                0    0  
12/9/2021 22:05  v01151     0       1.188               0    0  
12/9/2021 22:05  v01152     1       0.312               0   0   
12/9/2021 22:05  v01153     1       1.49                0   -1  
12/9/2021 22:09  v01153     1       4                   1   0   
12/10/2021 22:07 v01153     1       6                   0   -1            6

I tried below but the code calculates cummulative sum of all values in when the values in b == -1 it doesnot reset to 0 once the value in b is 0

df['test']= df.loc[(df['b'] == -1) & (df['b'].shift(+1) == 1), 'time'].cumsum()
Laurent
  • 12,287
  • 7
  • 21
  • 37
anonymous13
  • 581
  • 1
  • 5
  • 17
  • You never mention column "a" in the problem statement. Why are you using df['a'].shift(+1)? You said "when value in b is -1 which is preceeded by 1" This means in the same column (b) right? – Pab Jan 25 '22 at 01:45
  • Yes, I have edited my question. I basically wan cumulative sum of values only when the value sin b are -1 for that group alone. Once the value in b changes to 0 I want cummulative sum to reset to 0 – anonymous13 Jan 25 '22 at 02:07
  • Is the code working until this point? ```df['test']= df.loc[(df['b'] == -1) & (df['b'].shift(+1) == 1), 'time']```. Is df.test as expected? – Pab Jan 25 '22 at 02:10
  • Please take a minute and repost your question with a correct statement. What you are saying in the comments and in your problem statement are incongruent. – Pab Jan 25 '22 at 02:21
  • I did repost my question – anonymous13 Jan 25 '22 at 16:40

2 Answers2

0

You could try like this:

for i, row in df.iterrows():
    if row["b"] == -1 and i != len(df) - 1:
        cumsum += row["time"]
    if i != 0 and row["b"] == 0:
        df.loc[i - 1, "cumsum"] = cumsum
        cumsum = 0
    if row["b"] == -1 and i == len(df) - 1:
        cumsum += row["time"]
        df.loc[i, "cumsum"] = cumsum

So that:

print(df.fillna(0))
# Outputs
          date       time      id  value  a  b     cumsum
0    11/1/2021      0.000  v01111      4  0  0      0.000
1    11/1/2021     57.302  v01111     11  1  1      0.000
2    11/1/2021  10129.880  v01111      1  0 -1      0.000
3    11/1/2021      0.490  v01111      1  0 -1  10130.370
4    11/1/2021      0.202  v01111      2  0  0      0.000
5    11/1/2021     58.699  v01111     11  1  1      0.000
6    11/1/2021   3573.728  v01111      1  0 -1      0.000
7    11/1/2021      0.290  v01111      1  0 -1   3574.018
8    11/2/2021      0.200  v01111      2  0  0      0.000
9    12/9/2021     58.525  v01112     11  1  0      0.000
10   12/9/2021   2145.745  v01113     11  0 -1      0.000
11   12/9/2021      0.016  v01114      1  0 -1   2145.761
12   12/9/2021      0.010  v01150      0  0  0      0.000
13   12/9/2021      1.188  v01151      0  0  0      0.000
14   12/9/2021      0.312  v01152      1  0  0      0.000
15   12/9/2021      1.490  v01153      1  0 -1      1.490
16   12/9/2021      4.000  v01153      1  1  0      0.000
17  12/10/2021      6.000  v01153      1  0 -1      6.000
Laurent
  • 12,287
  • 7
  • 21
  • 37
0

We group by runs of the same value in 'b' and apply the sum to groups; then we zero out the sums where 'b' is not -1. 'cumsum' shows the sum of individual groups of 'b'==-1 for each row in the group

grouper = (df['b'] != df['b'].shift()).cumsum()
df['cumsum'] = df.groupby(grouper, group_keys = False)['time'].transform(sum)
df.loc[df['b'] != -1,'cumsum'] = 0
df

output

         time    a    b    cumsum
--  ---------  ---  ---  --------
 0      0        0    0      0
 1     57.302    1    1      0
 2  10129.9      0   -1  10130.4
 3      0.49     0   -1  10130.4
 4      0.202    0    0      0
 5     58.699    1    1      0
 6   3573.73     0   -1   3574.02
 7      0.29     0   -1   3574.02
 8      0.2      0    0      0
 9     58.525    1    0      0
10   2145.74     0   -1   2145.76
11      0.016    0   -1   2145.76
12      0.01     0    0      0
13      1.188    0    0      0
14      0.312    0    0      0
15      1.49     0   -1      1.49
16      4        1    0      0
17      6        0   -1      6
piterbarg
  • 8,089
  • 2
  • 6
  • 22