Pandas percentage of total with groupby with more than one column

Question

I have a pandas dataframe with that contains 2 attributes (State & Office ID) and two measures ( Sales and Units) for that office.

I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%). I would like to do the same thing for units. Similar to this problem, but I have multiple measures (link).

np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)],
                   'units': [np.random.randint(100, 999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id'])['sales','units'].sum()

Here is my desired output

I would like to solve it like this (or maybe with a loop if I have many measures):

state_pcts['sales'] = state_office.groupby(level=0).apply(lambda x: 100 * x['sales'] / float(x['sales'].sum()))    
state_pcts['units'] = state_pcts.groupby(level=0).apply(lambda x: 100 * x['units'] / float(x['units'].sum()))

But this does not work. It's possible to solve this by setting and resetting the index for every column, but there must be a better way with Lambda. EG

state_office.set_index(["state", "office_id","units"], inplace = True)
state_pcts = state_office.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).reset_index()
state_pcts.set_index(["state", "office_id","sales"], inplace = True)
state_pcts = state_pcts.groupby(level=0).apply(lambda x: 100 * x / float(x.sum())).reset_index()

score 1 · Accepted Answer · answered Dec 15 '20 at 15:05

You can use groupby().transform() to keep the original index:

state_office.div(state_office.groupby(level=0).transform('sum')).mul(100)

Output:

                     sales      units
state office_id                      
AZ    2          16.981365  31.059160
      4          19.250033  23.664122
      6          63.768601  45.276718
CA    1          19.331879  22.049287
      3          33.858747  24.254215
      5          46.809373  53.696498
CO    1          36.851857  29.506546
      3          19.874290  35.246727
      5          43.273852  35.246727
WA    2          34.707233  34.645669
      4          35.511259  16.596002
      6          29.781508  48.758328

Pandas percentage of total with groupby with more than one column

1 Answers1