Let's suppose that I have a pandas dataFrame (data_stores
) similar to the following:
store| item1 | item2 | item3
------------------------------
1 | 45 | 50 | 53
1 | 200 | 300 | 250
2 | 20 | 17 | 21
2 | 300 | 350 | 400
Let's say that I want to aggregate on column item1
with the mean
and on columns item2
and item3
with the sum
.
This could be commonly done in the following way:
data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', 'item2': 'sum', 'item3': 'sum' })
However, this cannot be done (more efficiently) in the following way:
data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', ['item2', 'item3']: 'sum' })
neither in the following way which makes more sense for dictionary keys:
data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'mean': 'item1':, 'sum': ['item2', 'item3']})
Is there any way to do an aggregation with the same function on some columns of a dataframe without writing a new dictionary attribute at the agg
function for each of them?