0

I have a simple dataframe as shown below

Entity class   7dayAvg  10dayAvg
A      class1   50      100
A      class2   25      200
B      class1   40      80
B      class2   80      200

The problem at hand requires finding the percentage difference(7dayAvg and for 10dayAvg) between class1 and class2 for each entity A and B. Hence, the answer should be:

Entity  7day%Diff     10day%diff
A       (50-25)/50    (100-200)/100
B       (40-80)/40    (80-200)/80
CodeMaster
  • 431
  • 4
  • 14
  • 1
    Doing "operation per entity" just means `.groupby('Entity')` ; you could also use `.join()`. In your case the operation is an aggregation, so `df.groupby('Entity').agg(...)` – smci Feb 11 '21 at 03:24
  • 1
    Does this answer your question? [Pandas groupby multiple columns, with pct\_change](https://stackoverflow.com/questions/40273251/pandas-groupby-multiple-columns-with-pct-change) – smci Feb 11 '21 at 06:55

3 Answers3

3

Try with groupby().pct_change():

df[['Entity']].join(-df.groupby('Entity')[['7dayAvg', '10dayAvg']]
                       .pct_change().dropna(),
                    how='inner')

Output:

  Entity  7dayAvg  10dayAvg
1      A      0.5      -1.0
3      B     -1.0      -1.5
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

You have to attempt to code it; here are hints for you; post your code attempt in the edited question if you get stuck:

  • groupby('Entity'), then use .aggregate() with a custom aggregation defied by a dictionary
  • you want to compute 7day%Diff which is the relative difference in 7dayAvg
  • ditto 10day%diff on 10dayAvg

It's a bad idea to use special or magic characters in column (or index) names, so I'd use the names 7day_diff, 10day_diff or even just diff_7d, diff_10d for clarity and simplicitity. (See also PEP-8 style recommendations)

smci
  • 32,567
  • 20
  • 113
  • 146
1
>>> x = pd.DataFrame({'entity': {0: 'a', 1: 'a', 2: 'b', 3: 'b'}, 'class': {0: 1, 1: 2, 2: 1, 3: 2}, '7da': {0: 50, 1: 25, 2: 40, 3: 80}, '10da': {0: 100, 1: 200, 2: 80, 3: 200}})

Another option is two divide two data frames, one with the difference

>>> x.set_index('entity').groupby('entity').diff().dropna()*-1
        class   7da   10da
entity
a        -1.0  25.0 -100.0
b        -1.0 -40.0 -120.0

Another with the denominator:

>>> x.set_index('entity').groupby('entity').first()
        class  7da  10da
entity
a           1   50   100
b           1   40    80

Giving:

>>> ((x.set_index('entity').groupby('entity').diff().dropna()*-1) / (x.set_index('entity').groupby('entity').first())).drop('class', axis = 1)
        7da  10da
entity
a       0.5  -1.0
b      -1.0  -1.5
Adam Zeldin
  • 898
  • 4
  • 6