1

I am trying to figure out how to sum on the outer most level of my multi-index. So I want to sum the COUNTS column based on the individual operators, and all the shops listed for it.

df=pd.DataFrame(data.groupby('OPERATOR').SHOP.value_counts())
df=df.rename(columns={'SHOP':'COUNTS'})
df['COUNTS'] = df['COUNTS'].astype(float)
df['percentage']=df.groupby(['OPERATOR'])['COUNTS'].sum()
df['percentage']=df.sum(axis=0, level=['OPERATOR', 'SHOP'])
df.head()

                      COUNTS    percentage
OPERATOR    SHOP        
AVIANCA     CC9          3.0    3.0
            FF9          1.0    1.0
            IHI          1.0    1.0
Aegean      HA9          33.0   33.0
            IN9          24.0   24.0

When I use the df.sum call, it lets me call it on both levels, but then when I change it to df.sum(axis=0, level=['OPERATOR'], it results in the percentage column being NaN. I originally had the count column as int so I thought maybe that was the issue, and converted to float, but this didn't resolve the issue. This is the desired output:

                      COUNTS    percentage
OPERATOR    SHOP        
AVIANCA     CC9          3.0    5.0
            FF9          1.0    5.0
            IHI          1.0    5.0
Aegean      HA9          33.0   57.0
            IN9          24.0   57.0

(This is just a stepping stone on the way to calculating the percentage for each shop respective to the operator, i.e. the FINAL final output would be):

                      COUNTS    percentage
OPERATOR    SHOP        
AVIANCA     CC9          3.0    .6
            FF9          1.0    .2
            IHI          1.0    .2
Aegean      HA9          33.0   .58
            IN9          24.0   .42

So bonus points if you include the last step of that as well!! Please help me!!!

  • Could you edit your question to include the data used? I'm not too familiar with pandas so being able to play around with the data would be useful. – Cmd858 Jun 25 '21 at 18:55

1 Answers1

1

Group by OPERATOR and normalize your data:

df['percentage'] = df.groupby('OPERATOR')['COUNTS'] \
                     .transform(lambda x: x / x.sum()) \
                     .round(2)
>>> df
               COUNTS  percentage
OPERATOR SHOP
AVIANCA  CC9      3.0        0.60
         FF9      1.0        0.20
         IHI      1.0        0.20
Aegean   HA9     33.0        0.58
         IN9     24.0        0.42
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • If you always want to group by the outer most level, no matter how this index column is called you could use `df.groupby(level=0)` instead of `df.groupby('OPERATOR')` – Tobias Jun 26 '21 at 12:06