I have a data frame that I want to group by two variables, and then perform calculation within those variables. Is there any easy way to do this and put the information BACK into a DataFrame when I'm done, i.e. like this:
df=pd.DataFrame({'A':[1,1,1,2,2,2,30,12,122,345],
'B':[1,1,1,2,3,3,3,2,3,4],
'C':[101,230,12,122,345,23,943,83,923,10]})
total = []
avg = []
AID = []
BID = []
for name, group in df.groupby(['A', 'B']):
total.append(group.C.sum())
avg.append(group.C.sum()/group.C.nunique())
AID.append(name[0])
BID.append(name[1])
x = pd.DataFrame({'total':total,'avg':avg,'AID':AID,'BID':BID})
But obviously much more efficiently?