How to use two different functions within crosstab/pivot_table in pandas?

Question

Using pandas, is it possible to compute a single cross-tabulation (or pivot table) containing values calculated from two different functions?

import pandas as pd
import numpy as np

c1 = np.repeat(['a','b'], [50, 50], axis=0)
c2 = list('xy'*50)
c3 = np.repeat(['G1','G2'], [50, 50], axis=0)
np.random.shuffle(c3)
c4=np.repeat([1,2], [50,50],axis=0)
np.random.shuffle(c4)
val = np.random.rand(100)

df = pd.DataFrame({'c1':c1, 'c2':c2, 'c3':c3, 'c4':c4, 'val':val})

frequencyTable = pd.crosstab([df.c1,df.c2],[df.c3,df.c4])
meanVal = pd.crosstab([df.c1,df.c2],[df.c3,df.c4],values=df.val,aggfunc=np.mean)

So, both the rows and the columns are the same in both tables, but what I'd really like is a table with both frequencies and mean values:

c3           G1                       G2          
c4     1              2              1              2
c1 c2  freq val       freq val       freq val       freq val         
a  x   6    0.624931  5    0.582268  8    0.528231  6    0.362804
   y   7    0.493890  8    0.465741  3    0.613126  7    0.312894
b  x   9    0.488255  5    0.804015  6    0.722640  5    0.369480
   y   6    0.462653  4    0.506791  5    0.583695  10   0.517954

Did you try using `aggfunc=[func1,func2]`? I have no idea if this works but I think I've seen it before somewhere. — Brian, Sep 04 '13 at 17:36
@Brian: Yes, I just tried out joris answer, and it works great! Many thanks! — HappyPy, Sep 04 '13 at 18:07
@Brian Yeah, sorry, I was already typing my answer when I saw your comment :-) — joris, Sep 04 '13 at 18:15
@joris No worries. Mine was a shot in the dark. You actually knew what you were doing lol. — Brian, Sep 04 '13 at 18:16

joris · Accepted Answer · 2013-09-04T17:54:36.700

You can give a list of functions:

pd.crosstab([df.c1,df.c2], [df.c3,df.c4], values=df.val, aggfunc=[len, np.mean])

If you want the table as shown in your question, you will have to rearrange the levels a bit:

In [42]: table = pd.crosstab([df.c1,df.c2], [df.c3,df.c4], values=df.val, aggfunc=[len, np.mean])

In [43]: table
Out[43]: 
       len                mean                              
c3      G1     G2           G1                  G2          
c4       1  2   1  2         1         2         1         2
c1 c2                                                       
a  x     4  6   8  7  0.303036  0.414474  0.624900  0.425234
   y     5  5   8  7  0.543363  0.480419  0.583499  0.637657
b  x    10  6   4  5  0.400279  0.436929  0.442924  0.287572
   y     6  8   5  6  0.400427  0.623319  0.764506  0.408708

In [44]: table.reorder_levels([1, 2, 0], axis=1).sort_index(axis=1)
Out[44]: 
c3      G1                            G2                         
c4       1              2              1              2          
       len      mean  len      mean  len      mean  len      mean
c1 c2                                                            
a  x     4  0.303036    6  0.414474    8  0.624900    7  0.425234
   y     5  0.543363    5  0.480419    8  0.583499    7  0.637657
b  x    10  0.400279    6  0.436929    4  0.442924    5  0.287572
   y     6  0.400427    8  0.623319    5  0.764506    6  0.408708

BTW, is it possible to make "len" and "mean" columns of their own when reshaping with melt? I tried: pd.melt(table.reset_index(), id_vars=['c1','c2']), but I get a column named "NaN" with "len" and "mean" as elements. Thanks! — HappyPy, Sep 05 '13 at 17:50
What do you exactly mean with "columns of their own"? Do you want one column with "mean" and one column with "len" besided the columns "c1", "c2", "c3", "c4", as you would get with `df.groupby(["c1", "c2", "c3", "c4"]).aggregate([len, np.mean]).reset_index()`? In that case you can do it directly instead of via crosstab and melt. — joris, Sep 05 '13 at 19:36

How to use two different functions within crosstab/pivot_table in pandas?

1 Answers1