pandas groupby with two key

Question

I took a whole afternoon trying to implement this task but failed ,I've got a pandas data frame like this

columns=[ka,kb_1,kb_2,timeofEvent,timeInterval]
0:'3M' '2345' '2345' '2014-10-5',3000
1:'3M' '2958' '2152' '2015-3-22',5000
2:'GE' '2183' '2183' '2012-12-31',515
3:'3M' '2958' '2958' '2015-3-10',395
4:'GE' '2183' '2285' '2015-4-19',1925
5:'GE' '2598' '2598' '2015-3-17',1915

What is to be implemented is a new data frame grouped by "ka and kb_1" below

columns=[ka,kb,errorNum,errorRate,totalNum of records]
'3M','2345',0,0%,1
'3M','2958',1,50%,2
'GE','2183',1,50%,2
'GE','2598',0,0%,1

(definition of error Record: when kb_1!=kb_2,the corresponding record is treated as abnormal record)

My code is like this

df['isError'] = (df['kb_1'] != df['kb_2']).astype('int')
grouped2 = df.groupby(['ka', 'kb_1'])

df_rst = pd.DataFrame()
df_rst['ka']  =grouped2['ka'].all()
df_rst['kb_1'] = grouped2['kb_1'].all()
df_rst['errorNum'] = grouped2['isError'].transform(sum)
df_rst['totalNum of records'] = grouped2.size()
df_rst['Soll_neq_Letzt_error_rate'] = df_rst['errorNum'].astype('float').div(df_rst['totalNum'].astype('float'), axis='index')
df_rst.to_csv('rst.csv',index=False)

but the result is not what I wanted.

For instance, the column kb_1 becomes true/false, and errorNum becomes Nan. Can anyone explain why and give an workable implementation? Thanks

Please show sample data and desired results. Saying the results are not what you wanted doesn't really tell us what you do want. — JohnE, May 14 '15 at 17:14

JohnE · Accepted Answer · 2015-05-14T18:36:25.287

1

I'm not sure exactly what you did, but I don't think you were that far off.

df2 = df.groupby(['ka','kb_1'])['isError'].agg({ 'errorNum':  'sum',
                                                 'recordNum': 'count' })

df2['errorRate'] = df2['errorNum'] / df2['recordNum']

         recordNum  errorNum  errorRate
ka kb_1                                
3M 2345          1         0        0.0
   2958          2         1        0.5
GE 2183          2         1        0.5
   2598          1         0        0.0

edited May 14 '15 at 18:36

answered May 14 '15 at 18:30

JohnE

29,156
8
79
109

wow,thanks,can't image it just takes two lines to complete, as I was still thinking of joining the table. – sunxd May 14 '15 at 19:13
Glad that helps, please remember to click the checkmark if you are reasonably happy with the answer. – JohnE May 14 '15 at 21:13
done, I met another difficulty in achieving this because the csv I try to manipulate too big to put into memory, see http://stackoverflow.com/questions/30245640/pandas-read-csv-out-of-memory – sunxd May 15 '15 at 07:21

pandas groupby with two key

1 Answers1

Linked