2

I have a data frame that I want to group by two variables, and then perform calculation within those variables. Is there any easy way to do this and put the information BACK into a DataFrame when I'm done, i.e. like this:

df=pd.DataFrame({'A':[1,1,1,2,2,2,30,12,122,345],
'B':[1,1,1,2,3,3,3,2,3,4],
'C':[101,230,12,122,345,23,943,83,923,10]})

total = []
avg = []
AID = []
BID = []
for name, group in df.groupby(['A', 'B']):
    total.append(group.C.sum())
    avg.append(group.C.sum()/group.C.nunique())
    AID.append(name[0])
    BID.append(name[1])

x = pd.DataFrame({'total':total,'avg':avg,'AID':AID,'BID':BID})

But obviously much more efficiently?

user1566200
  • 1,826
  • 4
  • 27
  • 47

1 Answers1

2

You can use pandas aggregate function after groupby:

import pandas as pd
import numpy as np
df.groupby(['A', 'B'])['C'].agg({'total': np.sum, 'avg': np.mean}).reset_index()

#      A    B   total          avg
# 0    1    1     343   114.333333
# 1    2    2     122   122.000000
# 2    2    3     368   184.000000
# 3   12    2      83    83.000000
# 4   30    3     943   943.000000
# 5  122    3     923   923.000000
# 6  345    4      10    10.000000
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • I feel like this is the answer, but I have one more question - let's say that 'avg' is really the sum() divided by the unique elements in another column - how would I reference that other column. np.sum/np.unique(group['other_column'])? Does that make sense? – user1566200 Jul 25 '16 at 19:30
  • The I would probably do it in two steps. `df1 = df.groupby(['A', 'B'])['C'].agg({'total': np.sum}); df1['avg'] = df1.total/df.groupby(['A', 'B'])['other'].agg({'num': pd.Series.nunique}).num; df1.reset_index()` something like this; – Psidom Jul 25 '16 at 19:39