I have a dataset like this (example purpose)
df = pd.DataFrame({
'Store' : [100, 100, 100, 100, 101, 101, 101, 101],
'Product' : [5, 3, 10, 1, 3, 11, 2, 5],
'Category' : ['A', 'B', 'C', 'A', 'B', 'A', 'C', 'A'],
'Sales' : [100, 235, 120, 56, 789, 230, 300, 35]
})
So it is like
Store Product Category Sales
100 5 A 100
100 3 B 235
100 10 C 120
100 1 A 56
101 3 B 789
101 11 A 230
101 2 C 300
101 5 A 35
Each store has some products and each product has some category. I need to find out the total sales for each store and percentage sales for each category in each stores. So result has to be something like this:
total_Sales Category_A Category_B Category_C
Store
100 511 30.528376 45.988258 23.483366
101 1354 19.571640 58.271787 22.156573
(The category columns are in %)
Currently I am doing it like this:
df1 = df.groupby(['Store']).apply(lambda x: x['Sales'].sum())
df1 = df1.to_frame()
df1 = df1.rename(columns={0 : 'Sales'})
def category_util(x, col, cat):
total_sales = x['Sales'].sum()
cat_sales = x[x[col] == cat]['Sales'].sum()
if cat_sales == 0:
return 0
else:
return cat_sales*100/total_sales
df1['Category_A'] = df.groupby(['Store']).apply(lambda x: category_util(x, 'Category', 'A'))
df1['Category_B'] = df.groupby(['Store']).apply(lambda x: category_util(x, 'Category', 'B'))
df1['Category_C'] = df.groupby(['Store']).apply(lambda x: category_util(x, 'Category', 'C'))
df1
is the desired output. It is working fine, but every apply
function is sorting the grouping columns again and again and for a big dataset, it is highly time consuming. I want to do this in one function call. I tried something like:
df.groupby(['Store']).agg([lambda x: category_util(x, 'Category', 'A'),
lambda x: category_util(x, 'Category', 'B'),
lambda x: category_util(x, 'Category', 'C')])
But it fails with a KeyError
for 'Sales`
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'Sales'
Any solution for this? Is there any way to use apply
function with an array of lambda functions and calculate all the columns in one go? If it is not possible with apply
, is it possible to do with agg
? It will really save me a lot of time. Thank you in advance.