How to calculate percentage change with zero in pandas?

Question

I want to calculate the percentage change for the following data frame.

import pandas as pd

df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
                   'points': [12, 0, 19, 22, 0, 25, 0, 30],
                   'score': [12, 0, 19, 22, 0, 25, 0, 30] 
                   
                   })
print(df)

When I applied this step, it returns inf which is obvious because we are dividing by zero.

df['score'] = df.groupby('team', sort=False)['score'].apply(
     lambda x: x.pct_change()).to_numpy()

But if we see in each column the change from 0 to 19 the change is 100%, from 0 to 25 the change is 100%, and from 0 to 30 the change is 100%. So, I was wondering how can I calculate those values.

current result

Expected result is

No, I want to calculate the percentage change between two rows. — Bad Coder, Nov 18 '22 at 19:51
I am sorry I thought that was a percentage change. Then what it is? — Bad Coder, Nov 18 '22 at 19:55
You should update your example to add an actual case where the % change is not NaN/0/inf. The current 1/-1 only output is quite confusing. — mozway, Nov 18 '22 at 20:50

mozway · Answer 1 · 2022-11-18T20:47:58.217

1

So you just want to replace the infinite values with 1?

import numpy as np

df[['points', 'score']] = (
  df.groupby('team')
    .pct_change()
    .replace(np.inf, 1)
)

Output:

  team  points  score
0    A     NaN    NaN
1    A    -1.0   -1.0
2    A     1.0    1.0
3    B     NaN    NaN
4    B    -1.0   -1.0
5    B     1.0    1.0
6    C     NaN    NaN
7    C     1.0    1.0

edited Nov 18 '22 at 20:47

answered Nov 18 '22 at 19:54

mozway

194,879
13
39
75

But I want the first column as well, how to do that? – Bad Coder Nov 18 '22 at 19:56
Do you want to modify the original DataFrame? Or a new one? – mozway Nov 18 '22 at 19:58
this is not the result I wanted. I want to calculate the percentage change between two rows but when the denominator is zero, it is giving me a problem. So, when the denominator is zero, I want to show a 100% increase or 1 – Bad Coder Nov 18 '22 at 20:33
Sorry but the question is unclear. – mozway Nov 18 '22 at 20:34
Sorry for the confusion, I have edited the question. Would you mind reviewing the question? If I use sign and take the difference between 12 and 11, it will return -1, which is wrong – Bad Coder Nov 18 '22 at 20:35
Also replacing with 100% when the denominator is 0 is meaningless. For example a change from 0 to 5 would be 100% but 0.001 to 5 would be 5000% – mozway Nov 18 '22 at 20:37
yeah but I have only integer value, not the decimal value. – Bad Coder Nov 18 '22 at 20:38
See if the update gives you what you want – mozway Nov 18 '22 at 20:48

Allan Elder · Answer 2 · 2022-11-18T20:42:28.157

0

Not sure if you want to count the drops in score as a negative, but this will give you the calculation you're looking for (multiplying by 100 to get to how you're representing the percentages in your output). Basically, diff calculates the difference between current and prior.

df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
               'points': [12, 0, 19, 22, 0, 25, 0, 30],
               'score': [12, 0, 19, 22, 0, 25, 0, 30]

               })

df["score"] = df.groupby('team', sort=False)['score'].diff() * 100

print(df)

To set the rows to 1 / -1, simply use loc for positive / negative values and set accordingly like so

df.loc[df["score"] < 0, "score"] = -1
df.loc[df["score"] > 0, "score"] = 1

edited Nov 18 '22 at 20:42

answered Nov 18 '22 at 19:44

Allan Elder

4,052
17
19

Can I replace those 1900, 2500, and 3000 with 1 or 100? I think when it dropped from 12 to 0, it dropped by 100 % and when it rise from 0 to 19 it increased by 100% – Bad Coder Nov 18 '22 at 19:50
I have changed the expected result in the question – Bad Coder Nov 18 '22 at 19:51
@BadCoder - I've updated the answer to show how you can set to 1 / -1 (you can change it to 100 / -100 or whatever else you wish) – Allan Elder Nov 18 '22 at 20:43

score 0 · Answer 3 · answered Nov 18 '22 at 19:56

0

# take the sign using np.sign for the diff b/w two consecutive rows
df['chg']=np.sign(df.groupby('team')['score'].diff())
df

    team    points  score   chg
0     A        12      12   NaN
1     A         0       0   -1.0
2     A        19      19   1.0
3     B        22      22   NaN
4     B         0       0   -1.0
5     B        25      25   1.0
6     C         0       0   NaN
7     C        30      30   1.0

answered Nov 18 '22 at 19:56

Naveed

11,495
2
14
21

If I use sign, and take the difference between 12 and 11, it will return -1, which is wrong – Bad Coder Nov 18 '22 at 20:30
@BadCoder, this answer matches the expected result. Please update the expected result if it is incorrect. – Michael Ruth Nov 18 '22 at 20:48

score 0 · Answer 4 · answered May 02 '23 at 11:41

We need to be careful to avoid division by zero errors when there are 0s in the second column. To do this, we use the replace method to replace any 0s with 1s before performing the calculation.

df['percent_change'] = ((df['col_2'] - df['col_1']) / df['col_2'].replace(0, 1)) * 100

How to calculate percentage change with zero in pandas?

4 Answers4