0

pandas version: 0.24.2

I'm trying to find growth rate by date for each unique group (country). My data looks like this:

'''

date        country  metric1        
2020-01-03  AD       45793
2020-03-28  US       13254524
2020-02-04  HT       224126
2020-02-04  AQ       106564
2020-01-11  LI       69168
2020-01-02  AD       47855
2020-03-23  MW       45795
2020-03-18  GM       38205
2020-03-06  CI       488755

'''

If I run:

df['metric1_change'] = df.groupby(['country'])['metric1'].pct_change() 

The outcome is incorrect and the pct_change does not relate to the previous values. See example outcome:

date        country   metric1   metric1_change
2020-01-01  AD        41550     0.048289
2020-01-02  AD        47855     0.184149
2020-01-03  AD        45793     0.255084

The first date is 2020-01-01 so the percentage change there should already be NaN and the others are completely incorrect.

If I sort before with:

df.sort_values(by=['country','date'])

I also do not receive the correct output:

date        country  metric1    metric1_change
2020-01-01  AD       41550      0.046916
2020-01-02  AD       47855      0.197273
2020-01-03  AD       45793      0.674088

Does anyone have any experience with this?

I have seen this question Pandas groupby and pct change not returning expected value but the apply doesn't help me and I think i'm making a more basic mistake.

WvonR
  • 158
  • 3
  • 13

0 Answers0