14

For each Name in the following dataframe I'm trying to find the percentage change from one Time to the next of the Amount column:

enter image description here

Code to create the dataframe:

import pandas as pd

df = pd.DataFrame({'Name': ['Ali', 'Ali', 'Ali', 'Cala', 'Cala', 'Cala', 'Elena', 'Elena', 'Elena'],
                   'Time': [1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'Amount': [24, 52, 34, 95, 98, 54, 32, 20, 16]})

df.sort_values(['Name', 'Time'], inplace = True)

The first approach I tried (based on this question and answer) used groupby and pct_change:

df['pct_change'] = df.groupby(['Name'])['Amount'].pct_change()

With the result:

enter image description here

This doesn't seem to be grouping by the name because it is the same result as if I had used no groupby and called df['Amount'].pct_change(). According to the Pandas Documentation for pandas.core.groupby.DataFrameGroupBy.pct_change, the above approach should work to calculate the percentage change of each value to the previous value within a group.

For a second approach I used groupby with apply and pct_change:

df['pct_change_with_apply'] = df.groupby('Name')['Amount'].apply(lambda x: x.pct_change())

With the result:

enter image description here

This time all the percentage changes are correct.

Why does the groupby and pct_change approach not return the correct values, but using groupby with apply does?

Edit January 28, 2018: This behavior has been corrected in the latest version of Pandas, 0.24.0. To install run pip install -U pandas.

smci
  • 32,567
  • 20
  • 113
  • 146
willk
  • 3,727
  • 2
  • 27
  • 44
  • Weird, `df.groupby(['Name'])['Amount'].pct_change()` is giving me the correct result. Am I just missing something? – ALollz Jun 28 '18 at 15:00
  • @ALollz what version of pandas? – piRSquared Jun 28 '18 at 15:01
  • 3
    I can confirm bugged behavior. Pandas 0.23.1 – piRSquared Jun 28 '18 at 15:02
  • I'm using 0.22.0 – ALollz Jun 28 '18 at 15:02
  • @caseWestern I'm guessing this is a bug in new version. Please confirm your version. Pandas is trapping the `pct_change` method on the `groupby` object and botching it up. – piRSquared Jun 28 '18 at 15:04
  • 1
    Already reported https://github.com/pandas-dev/pandas/issues/21621 – piRSquared Jun 28 '18 at 15:06
  • Using Pandas 0.23.1. Sorry if this is an issue that has already been reported! – willk Jun 28 '18 at 15:08
  • 3
    Please don't apologize (-: This is a good question. I'm just verifying that this is a bug and has been reported. And I found out it was reported because I was going to report it. I wouldn't have even tried to report it if you didn't bring it up. – piRSquared Jun 28 '18 at 15:12
  • It's interesting because `diff` does not have this issue. For example, `df.groupby('Name')['Amount'].diff()` returns the expected behavior. – willk Jun 28 '18 at 15:15
  • 1
    This has been fixed in a [pull request for Pandas](https://github.com/pandas-dev/pandas/pull/21235). If you install the most recent version of the [Pandas master branch on GitHub](https://github.com/pandas-dev/pandas) you can get the fix. Otherwise, I think this fix will be in the next release of the Pandas library. – willk Dec 25 '18 at 22:42

1 Answers1

3

As already noted by @piRSquared in the comments; this is due to a bug filed on Github under issue #21621. It already looks to be solved in milestone 0.24.0 (due 2018-12-31). My version (0.23.4) still displayed this bugged behaviour.

gosuto
  • 5,422
  • 6
  • 36
  • 57