2

I am trying to get similar excel functionality in pandas, mainly a % of type behavior. Using the following data:

{'A': ['a', 'b', 'b', 'a', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'a', 'b'], 
 'C': ['e', 'e', 'e', 'f', 'f', 'f', 'f', 'f', 'f', 'f', 'f', 'e', 'e'], 
 'B': ['c', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd', 'c', 'c', 'd'], 
 'D': ['g', 'g', 'h', 'h', 'g', 'g', 'h', 'h', 'g', 'g', 'h', 'h', 'g'], 
 'V1': [84.0, 440.0, 423.0, 63.0, 990.0, 192.0, 169.0, 387.0, 934.0, 208.0, 834.0, 923.0, 230.0], 
 'V2': [120.0, 942.0, 153.0, 284.0, 517.0, 695.0, 37.0, 30.0, 237.0, 56.0, 15.0, 696.0, 25.0]}

I create a DataFrameobject called df1 from this dictionary.

I want to display ultimately:

B   C   V1  V2  V2 as Percent of B
c   e   1870    1911    0.700770077
c   f   1887    816 0.299229923
d   e   230 25  0.023148148
d   f   1890    1055    0.976851852

I can get from Pandas pivot_table by executing pivot_table(df1,values=['V1','V2'],rows=['B','C'],aggfunc=numpy.sum,fill_value=0):

       V1    V2
B C            
c e  1870  1911
  f  1887   816
d e   230    25
  f  1890  1055

anyone have an idea how to do that last step to get the column?

Thanks! Jon

Jon
  • 734
  • 1
  • 7
  • 30

1 Answers1

5

Here's one way using a groupby to B level of the DataFrame:

In [11]: p
Out[11]: 
       V1    V2
B C            
c e  1870  1911
  f  1887   816
d e   230    25
  f  1890  1055

In [12]: g = p.groupby(level='B')

and take the percentage of each of the V2s in each group:

In [13]: g['V2'].apply(lambda s: s.astype(float) / s.sum())
Out[13]: 
B  C
c  e    0.700770
   f    0.299230
d  e    0.023148
   f    0.976852
dtype: float64

Finally, assign this as a column:

In [14]: p['PercentOfB'] = g['V2'].apply(lambda s: s.astype(float) / s.sum())

In [15]: p
Out[15]: 
       V1    V2  PercentOfB
B C                        
c e  1870  1911    0.700770
  f  1887   816    0.299230
d e   230    25    0.023148
  f  1890  1055    0.976852

To expand a little on what's going on here, during the apply the function is called on each group (in this case there are two, one for B='c' and one for B='d'), here's the c group:

In [21]: c
Out[21]: 
B  C
c  e    1911
   f     816
Name: c, dtype: int64

and the interested result is to divide through by the total:

In [22]: c.sum()
Out[22]: 2727

Unfortunately in python 2, integer division isn't "proper":

In [23]: c / c.sum()
Out[23]: 
B  C
c  e    0
   f    0
Name: c, dtype: int64

So we need to fix it by making them floats, usually I use astype(float) or * 1.0 to force this:

In [24]: c.astype(float) / c.sum()
Out[24]: 
B  C
c  e    0.70077
   f    0.29923
Name: c, dtype: float64

Apply then outputs this together with the B='d' group to get the desired result.

Note: Regarding how I got c, as I find this a very useful trick to writing applies.

I created a dummy function and an empty list, and applied it to the groupby:

a = []
def f(x):
    a.append(x)
    return x

g['v2'].apply(f)

c = a[0]

Then I played with that 'til I got what I want.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535