-1

Take the following df:

df = pd.DataFrame({'col_1':['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
                   'val_1':[10, 20, 15, 25, 20, 15, 20, 30, 40],
                   'val_2':[-5, 0, 5, 10, 5, 10, 15, 10, 5]})

  col_1  val_1  val_2
0     A     10     -5
1     B     20      0
2     C     15      5
3     A     25     10
4     B     20      5
5     C     15     10
6     A     20     15
7     B     30     10
8     C     40      5

I need to create a new column with the inverted sign of val_2, if col_1 == 'A' and if val_2 > 0. This is straightforward:

df['new_col'] = df.apply(lambda x: -x.val_2 if x.col_1 == 'A' and x.val_2 > 0 else 0, axis=1)

  col_1  val_1  val_2  new_col
0     A     10     -5        0
1     B     20      0        0
2     C     15      5        0
3     A     25     10      -10
4     B     20      5        0
5     C     15     10        0
6     A     20     15      -15
7     B     30     10        0
8     C     40      5        0

But my final goal is just to have the incremental value in new_col, that is, the value of row(n) minus the cumulative sum of above rows.

The resulting df should look like this:

  col_1  val_1  val_2  new_col
0     A     10     -5        0
1     B     20      0        0
2     C     15      5        0
3     A     25     10      -10
4     B     20      5        0
5     C     15     10        0
6     A     20     15       -5 --> -15 - (-10) --> (-10 is the cumsum of previous rows)
7     B     30     10        0
8     C     40      5        0
Daniel Arges
  • 345
  • 3
  • 13
  • Does this answer your question? [Can I perform dynamic cumsum of rows in pandas?](https://stackoverflow.com/questions/54208023/can-i-perform-dynamic-cumsum-of-rows-in-pandas) – piterbarg Mar 22 '21 at 23:52

1 Answers1

1

It makes things easier if you use NaN instead of 0

cond = (df.col_1 == 'A') & (df.val_2 > 0)
df.loc[ cond, 'new_col' ] = -df.loc[ cond, 'val_2' ]


>>> df
  col_1  val_1  val_2  new_col
0     A     10     -5      NaN
1     B     20      0      NaN
2     C     15      5      NaN
3     A     25     10    -10.0
4     B     20      5      NaN
5     C     15     10      NaN
6     A     20     15    -15.0
7     B     30     10      NaN
8     C     40      5      NaN

You can dropna() to leave just the values.

>>> df.new_col.dropna().diff()
3    NaN
6   -5.0
Name: new_col, dtype: float64

Which you can pass to .update()

df.update(df.new_col.dropna().diff())

>>> df
  col_1  val_1  val_2  new_col
0     A     10     -5      NaN
1     B     20      0      NaN
2     C     15      5      NaN
3     A     25     10    -10.0
4     B     20      5      NaN
5     C     15     10      NaN
6     A     20     15     -5.0
7     B     30     10      NaN
8     C     40      5      NaN

You then put the 0 in with .fillna(0)