3

I'm trying to find a vectorized solution in pandas that is quite common in spreadsheets which is to cumsum while skipping or setting fixed values on a condition based on the result of the actual cumsum. I have the following:

    A
1   0
2  -1
3   2
4   3
5  -2
6  -3
7   1
8  -1
9   1
10 -2
11  1
12  2 
13 -1
14 -2

What I need is to add a second column with the cumsum of 'A' and if one of these sums gives a positive value replace it with 0 and continue the cumsum using that 0. At the same time if the cumsum gives a negative value that is lower than the lowest value in column A recorded after a 0 in column B I will need to replace it with that lowest value in column A. I know this is quite a problem but is there a vectorized solution for this? Maybe using an auxiliary column. The result should look like this:

    A   B
1   0   0
2  -1  -1  # -1+0 = -1
3   2   0  # -1 + 2 = 1 but  1>0 so this is 0
4   3   0  # same as previous row
5  -2  -2  # -2+0 = -2
6  -3  -3  # -2-3 = -5 but the lowest value in column A since last 0 is -3 so this is replaced by -3
7   1  -2  #  1-3 = -2
8  -1  -3  # -1-2 = -3
9   1  -2  # -3 + 1 = -2
10 -2  -3  # -2-2 = -4 but the lowest value in column A since last 0 is -3 so this is replaced by -3 
11  1  -2  # -3 +1 = -2
12  2   0  # -2+2 = 0
13 -1  -1  # 0-1 = -1
14 -2  -2  # -1-2 = -3 but the lowest value in column A since last cap is -2 so this is -2 instead of -3

For the moment I made this but does not work 100% and again is not really efficient:

df['B'] = 0
df['B'][0] = 0
for x in range(len(df)-1):
    A = df['A'][x + 1]
    B = df['B'][x] + A
    if B >= 0:
        df['B'][x+1] = 0
    elif B < 0 and A < 0 and B < A:
        df['B'][x+1] = A
    else:
        df['B'][x + 1] = B
Gus
  • 193
  • 9
  • How do you get values from column B?, I'm trying using `pd.Series([0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]).cumsum()` and it's returning this array `[0, -1, 1, 4, 2, -1, 0, -1, 0, -2, -1, 1, 0, -2]` I could take this cumulative sum and perfomn necessary operations to get current values of your B column. But I can't figure out how to get `[Nan, -1, 0, 0, -2, -3, -2, -3, -2, -3, -2, 0, -1, -2] ` – user11717481 Nov 07 '21 at 23:25
  • The column B is the actual result that I need. A cumsum will not work to get to column B since i need to apply conditions on the cumsum to get column B. I´ll edit my question so is more clear. – Gus Nov 07 '21 at 23:40
  • you could use `df['A'].expanding(1).apply(function)` to run own `function` which will get: first ony one row, next 2 rows, next 3 rows, etc. This way I can almost get result but still have problem with checking the lowest value since last 0. – furas Nov 08 '21 at 00:46
  • I have a doubt with the minimum value if we replace it consecutively or we only take it from the previous assigned value to B for example `min = min or B [i-1]` – user11717481 Nov 08 '21 at 02:06
  • So the minimum value is obtained from column A and is updated every time column A gives a lower value then is reset when B = 0. – Gus Nov 08 '21 at 09:45

1 Answers1

1

Using df['A'].expanding(1).apply(function) I could run own function which first get only one row, next 2 rows, next 3 rows, etc. I doesn't give result from previous calculation and it needs to make all calculations again and again but it doesn't need global variables and hardcoded df['A']

Doc: Series.expanding

A = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]

import pandas as pd

df = pd.DataFrame({"A": A})

def function(values):
    #print(values)
    #print(type(valuse)
    #print(len(values))

    result = 0

    last_zero = 0

    for index, value in enumerate(values):
        result += value

        if result >= 0:
            result = 0
            last_zero = index
        else:
            minimal = min(values[last_zero:])
            #print(index, last_zero, minimal)
                        
            #if result < minimal:
            #   result = minimal
            result = max(result, minimal)
            
    #print('result:', result)
    return result

df['B'] = df['A'].expanding(1).apply(function)

df['B'] = df['B'].astype(int)

print(df)

Result:

    A  B
0   0  0
1  -1 -1
2   2  0
3   3  0
4  -2 -2
5  -3 -3
6   1 -2
7  -1 -3
8   1 -2
9  -2 -3
10  1 -2
11  2  0
12 -1 -1
13 -2 -2

The same but with normal apply() - it needs global variables and hardcoded df['A']

A = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]

import pandas as pd

df = pd.DataFrame({"A": A})

result = 0
last_zero = 0
index = 0

def function(value):
    global result
    global last_zero
    global index
    
    result += value

    if result >= 0:
        result = 0
        last_zero = index
    else:        
        minimal = min(df['A'][last_zero:])
        #print(index, last_zero, minimal)
                        
        #if result < minimal:
        #   result = minimal
        result = max(result, minimal)
       
    index += 1
    
    #print('result:', result)
    return result

df['B'] = df['A'].apply(function)
df['B'] = df['B'].astype(int)

print(df)

The same using normal for-loop

A = [0, -1, 2, 3, -2, -3, 1, -1, 1, -2, 1, 2, -1, -2]

import pandas as pd

df = pd.DataFrame({"A": A})

all_values = []

result = 0
last_zero = 0

for index, value in df['A'].iteritems():
    
    result += value
    
    if result >= 0:
        result = 0
        last_zero = index
    else:    
        minimal = min(df['A'][last_zero:])
        #print(index, last_zero, minimal)
                            
        #if result < minimal:
        #   result = minimal
        result = max(result, minimal)
           
    all_values.append(result)

df['B'] = all_values

print(df)
furas
  • 134,197
  • 12
  • 106
  • 148
  • I've seen that you put a lot of effort, I wanted to avoid for loops and if statements and also variables. I was thinking that It may be possible to create a dumb column with a ID number to identify each segment of the column between zeros, then get the minvalue using groupby to get the lowest value. – Gus Nov 08 '21 at 01:49
  • I also tried to do it without `for`-loop and expected that `expanding(1).apply()` will work like `cumsum()` but with own function - but it doesn't work and I couldn't find something better. – furas Nov 08 '21 at 11:38