3

I have a DataFrame with three levels on the main index:

from pandas import *
df_multi = DataFrame(np.random.rand(6,2), index = [['CF', 'CF', 'CF', 'DA', 'DA','DA'], ['x', 'y', 'y', 'x', 'y', 'y'], ['a', 'b', 'a', 'a', 'a', 'b']], columns = ['PC1', 'PC2'])
df_multi.index.names =['l1','l2','l3']

In [5]: df_multi
Out[5]: 
       PC1       PC2
l1 l2 l3                    
CF x  a   0.118061  0.473159
   y  b   0.159534  0.407676
      a   0.466731  0.163322
DA x  a   0.152799  0.333438
   y  a   0.632725  0.965348
      b   0.737112  0.834592

Now I want to sum across the third level and then divide each element by its corresponding sum to get shares at the third level (e.g. divide (CF, x, a) by (CF, x, a) and (CF, y, a) by (CF, y, a) + (CF, y, b), etc.)

In [6]: df_multi.sum(level = [0, 1])
Out[6]: 
    PC1       PC2
l1 l2                    
CF x   0.118061  0.473159
   y   0.626265  0.570998
DA x   0.152799  0.333438
   y   1.369837  1.799940

and

df_multi_share = df_multi.div(df_multi.sum(level = [0, 1]), level=[0, 1])

However this does not work. I am looking for a general solution, not confined to the computation of percentage shares, that enables me to do arithmetics matching on multiple levels. It does work when using only one level, e.g.

df_multi = DataFrame(np.random.rand(4,2), index = [['CF', 'CF', 'DA', 'DA'], ['1', '2', '1', '2']], columns = ['PC1', 'PC2'])

df_single = DataFrame(np.random.rand(3,3), index = ['1', '2', '3'], columns = ['PC1', 'PC2', 'PC3'])

df_combined = df_multi.mul(df_single, level = 1)
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
user2136502
  • 69
  • 1
  • 3

1 Answers1

2

This sounds like a job for transform.

df_multi.groupby(level=[0,1]).transform(lambda x: x/x.sum())

The relevant documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation

Dan Allan
  • 34,073
  • 6
  • 70
  • 63
  • 1
    Thanks, Allan. I looked more into .apply and .transform. However it seems that this allows to execute functions at the group level. It is a solution for my particular case. I would be happy however to find a more general solution that allows me to add for instance dataframe1 and dataframe2 where I match not only one one level of multi indices, but on multiple levels. E.g. dataframe1 has 3 levels, dataframe2 has two levels, and I want to add the two dataframes, matching the two levels of dataframe2 with the second and third level of dataframe1 and broadcast along the first level of dataframe1. – user2136502 Mar 07 '13 at 14:43