I have a pandas dataframe with that contains 2 attributes (State & Office ID) and two measures ( Sales and Units) for that office.
I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%). I would like to do the same thing for units. Similar to this problem, but I have multiple measures (link).
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)],
'units': [np.random.randint(100, 999)
for _ in range(12)]})
state_office = df.groupby(['state', 'office_id'])['sales','units'].sum()
I would like to solve it like this (or maybe with a loop if I have many measures):
state_pcts['sales'] = state_office.groupby(level=0).apply(lambda x: 100 * x['sales'] / float(x['sales'].sum()))
state_pcts['units'] = state_pcts.groupby(level=0).apply(lambda x: 100 * x['units'] / float(x['units'].sum()))
But this does not work. It's possible to solve this by setting and resetting the index for every column, but there must be a better way with Lambda. EG
state_office.set_index(["state", "office_id","units"], inplace = True)
state_pcts = state_office.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).reset_index()
state_pcts.set_index(["state", "office_id","sales"], inplace = True)
state_pcts = state_pcts.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).reset_index()