pandas version: 0.24.2
I'm trying to find growth rate by date for each unique group (country). My data looks like this:
'''
date country metric1
2020-01-03 AD 45793
2020-03-28 US 13254524
2020-02-04 HT 224126
2020-02-04 AQ 106564
2020-01-11 LI 69168
2020-01-02 AD 47855
2020-03-23 MW 45795
2020-03-18 GM 38205
2020-03-06 CI 488755
'''
If I run:
df['metric1_change'] = df.groupby(['country'])['metric1'].pct_change()
The outcome is incorrect and the pct_change does not relate to the previous values. See example outcome:
date country metric1 metric1_change
2020-01-01 AD 41550 0.048289
2020-01-02 AD 47855 0.184149
2020-01-03 AD 45793 0.255084
The first date is 2020-01-01 so the percentage change there should already be NaN and the others are completely incorrect.
If I sort before with:
df.sort_values(by=['country','date'])
I also do not receive the correct output:
date country metric1 metric1_change
2020-01-01 AD 41550 0.046916
2020-01-02 AD 47855 0.197273
2020-01-03 AD 45793 0.674088
Does anyone have any experience with this?
I have seen this question Pandas groupby and pct change not returning expected value but the apply doesn't help me and I think i'm making a more basic mistake.