19

I'm trying to find the period-over-period growth in Value for each unique group, grouped by (Company, Group, and Date).

Company Group Date     Value
A       X     2015-01  1
A       X     2015-02  2
A       X     2015-03  1.5
A       XX    2015-01  1
A       XX    2015-02  1.5
A       XX    2015-03  0.75
A       XX    2015-04  1
B       Y     2015-01  1
B       Y     2015-02  1.5
B       Y     2015-03  2
B       Y     2015-04  3
B       YY    2015-01  2
B       YY    2015-02  2.5
B       YY    2015-03  3

I've tried:

df.groupby(['Date','Company','Group']).pct_change()

but this returns all NaN.

The result I'm looking for is:

Company Group Date     Value/People
A       X     2015-01  NaN
A       X     2015-02  1.0
A       X     2015-03  -0.25
A       XX    2015-01  NaN
A       XX    2015-02  0.5
A       XX    2015-03  -0.5
A       XX    2015-04  0.33
B       Y     2015-01  NaN
B       Y     2015-02  0.5
B       Y     2015-03  0.33
B       Y     2015-04  0.5
B       YY    2015-01  NaN
B       YY    2015-02  0.25
B       YY    2015-03  0.2
smci
  • 32,567
  • 20
  • 113
  • 146
user3357979
  • 607
  • 1
  • 5
  • 12
  • Your issue here is that **you want to groupby multiple columns**, then do a `pct_change()`. – smci Feb 11 '21 at 06:54

3 Answers3

19

you want to get your date into the row index and groups/company into the columns

d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1

enter image description here

then use pct_change

d1.pct_change()

enter image description here

OR

with groupby

df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()
df

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
5

I'm not sure the groupby method works as intended as of Pandas 0.23.4 at least.

df['pct'] = df.sort_values('Date').groupby(['Company', 'Group']).Value.pct_change()

Produces this, which is incorrect for purposes of the question:

Incorrect Outcome

The Index+Stack method still works as intended, but you need to do additional merges to get it into the original form requested.

d1 = df.set_index(['Date', 'Company', 'Group']).Value.unstack(['Company', 'Group'])
d1 = d1.pct_change().stack([0,1]).reset_index()
df = df.merge(d1, on=['Company', 'Group', 'Date'], how='left')
df.rename(columns={0: 'pct'}, inplace=True)
df

Correct Outcome

SimonR
  • 117
  • 1
  • 8
3
df['Pct_Change'] = df.groupby(['Company','Group'])['Value'].pct_change()