Pandas 0.23 groupby and pct change not returning expected value

Question

For each Name in the following dataframe I'm trying to find the percentage change from one Time to the next of the Amount column:

Code to create the dataframe:

import pandas as pd

df = pd.DataFrame({'Name': ['Ali', 'Ali', 'Ali', 'Cala', 'Cala', 'Cala', 'Elena', 'Elena', 'Elena'],
                   'Time': [1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16]})

df.sort_values(['Name', 'Time'], inplace = True)

The first approach I tried (based on this question and answer) used groupby and pct_change:

df['pct_change'] = df.groupby(['Name'])['Amount'].pct_change()

With the result:

This doesn't seem to be grouping by the name because it is the same result as if I had used no groupby and called df['Amount'].pct_change(). According to the Pandas Documentation for pandas.core.groupby.DataFrameGroupBy.pct_change, the above approach should work to calculate the percentage change of each value to the previous value within a group.

For a second approach I used groupby with apply and pct_change:

df['pct_change_with_apply'] = df.groupby('Name')['Amount'].apply(lambda x: x.pct_change())

With the result:

This time all the percentage changes are correct.

Why does the groupby and pct_change approach not return the correct values, but using groupby with apply does?

Edit January 28, 2018: This behavior has been corrected in the latest version of Pandas, 0.24.0. To install run pip install -U pandas.

Weird, `df.groupby(['Name'])['Amount'].pct_change()` is giving me the correct result. Am I just missing something? — ALollz, Jun 28 '18 at 15:00
@caseWestern I'm guessing this is a bug in new version. Please confirm your version. Pandas is trapping the `pct_change` method on the `groupby` object and botching it up. — piRSquared, Jun 28 '18 at 15:04
Already reported https://github.com/pandas-dev/pandas/issues/21621 — piRSquared, Jun 28 '18 at 15:06
Using Pandas 0.23.1. Sorry if this is an issue that has already been reported! — willk, Jun 28 '18 at 15:08
Please don't apologize (-: This is a good question. I'm just verifying that this is a bug and has been reported. And I found out it was reported because I was going to report it. I wouldn't have even tried to report it if you didn't bring it up. — piRSquared, Jun 28 '18 at 15:12
It's interesting because `diff` does not have this issue. For example, `df.groupby('Name')['Amount'].diff()` returns the expected behavior. — willk, Jun 28 '18 at 15:15
This has been fixed in a [pull request for Pandas](https://github.com/pandas-dev/pandas/pull/21235). If you install the most recent version of the [Pandas master branch on GitHub](https://github.com/pandas-dev/pandas) you can get the fix. Otherwise, I think this fix will be in the next release of the Pandas library. — willk, Dec 25 '18 at 22:42

gosuto · Accepted Answer · 2019-01-24T20:27:49.743

3

As already noted by @piRSquared in the comments; this is due to a bug filed on Github under issue #21621. It already looks to be solved in milestone 0.24.0 (due 2018-12-31). My version (0.23.4) still displayed this bugged behaviour.

edited Jan 24 '19 at 20:27

answered Dec 27 '18 at 12:51

gosuto

5,422
6
36
57

Thanks for pointing this out! Can you confirm this is solved in the latest release of Pandas? – willk Jan 24 '19 at 15:21
1

`0.24.0` has been moved to 2019-01-31. – gosuto Jan 24 '19 at 20:27
1

I can confirm `0.24.0` is out now and that the issue has been solved. Run `pip install --upgrade pandas` to get the latest release. – gosuto Jan 27 '19 at 09:25

Pandas 0.23 groupby and pct change not returning expected value

1 Answers1

Linked