17

I am trying to find the average monthly cost per user_id but i am only able to get average cost per user or monthly cost per user.

Because i group by user and month, there is no way to get the average of the second groupby (month) unless i transform the groupby output to something else.

This is my df:

     df = { 'id' : pd.Series([1,1,1,1,2,2,2,2]),
            'cost' : pd.Series([10,20,30,40,50,60,70,80]),
            'mth': pd.Series([3,3,4,5,3,4,4,5])}

   cost  id  mth
0    10   1    3
1    20   1    3
2    30   1    4
3    40   1    5
4    50   2    3
5    60   2    4
6    70   2    4
7    80   2    5

I can get monthly sum but i want the average of the months for each user_id.

df.groupby(['id','mth'])['cost'].sum()

id  mth
1   3       30
    4       30
    5       40
2   3       50
    4      130
    5       80

i want something like this:

id average_monthly
1 (30+30+40)/3
2 (50+130+80)/3
jxn
  • 7,685
  • 28
  • 90
  • 172

2 Answers2

22

Resetting the index should work. Try this:

In [19]: df.groupby(['id', 'mth']).sum().reset_index().groupby('id').mean()  
Out[19]: 
    mth       cost
id                
1   4.0  33.333333
2   4.0  86.666667

You can just drop mth if you want. The logic is that after the sum part, you have this:

In [20]: df.groupby(['id', 'mth']).sum()
Out[20]: 
        cost
id mth      
1  3      30
   4      30
   5      40
2  3      50
   4     130
   5      80

Resetting the index at this point will give you unique months.

In [21]: df.groupby(['id', 'mth']).sum().reset_index()
Out[21]: 
   id  mth  cost
0   1    3    30
1   1    4    30
2   1    5    40
3   2    3    50
4   2    4   130
5   2    5    80

It's just a matter of grouping it again, this time using mean instead of sum. This should give you the averages.

Let us know if this helps.

WGS
  • 13,969
  • 4
  • 48
  • 51
  • 5
    No need to `reset_index`. `df.groupby(['id', 'mth']).sum().groupby(level=0).mean()` will produce the same result. – Kartik Oct 16 '16 at 07:05
  • 3
    How can we assign this back to the original dataframe as a new column? The [transform trick](https://stackoverflow.com/a/30244979/281545) can't cope with the double group by – Mr_and_Mrs_D May 05 '18 at 20:39
  • To get the transform, you could first set `id` as the index, then run the `groupby` operations: ``df = df.set_index('id'); df['avg'] = df.groupby(['id','mth']).sum().groupby(level=0).mean()`` – sammywemmy Jul 02 '20 at 09:57
-1
df_monthly_average = (
    df.groupby(["InvoiceMonth", "InvoiceYear"])["Revenue"]
    .sum()
    .reset_index()
    .groupby("Revenue")
    .mean()
    .reset_index()
)
Jack Deeth
  • 3,062
  • 3
  • 24
  • 39
  • Hi and thanks for the answer. Its great that it works for you, but it would help us if you could explain what you did and how did you solve the initial problem! – Simas Joneliunas Feb 10 '22 at 05:27