Pandas pivot table for multiple columns at once

Question

Let's say I have a DataFrame:

   nj  ptype  wd  wpt
0   2      1   2    1
1   3      2   1    2
2   1      1   3    1
3   2      2   3    3
4   3      1   2    2

I would like to aggregate this data using ptype as the index like so:

             nj             wd            wpt
       1.0  2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0
ptype    
    1    1    1    1    0    2    1    2    1    0
    2    0    1    1    1    0    1    0    1    1

You could build each one of the top level columns for the final value by creating a pivot table with aggfunc='count' and then concatenating them all, like so:

nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
print(out)
        nj             wd            wpt
         1    2    3    1    2    3    1    2    3
ptype
1      1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
2      0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

But I really dislike this and it feels wrong. I would like to know if there is a way to do this in a simpler fashion preferably with a builtin method. Thanks in advance!

Psidom · Accepted Answer · 2017-05-25T17:54:34.970

Instead of doing it in one step, you can do the aggregation firstly and then pivot it using unstack method:

(df.set_index('ptype')
 .groupby(level='ptype')
# to do the count of columns nj, wd, wpt against the column ptype using 
# groupby + value_counts
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))

#      nj             wd            wpt
#       1    2    3    1    2    3    1    2    3
#ptype                                  
#1    1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
#2    0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

Another option to avoid using apply method:

(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))

Naive Timing on the sample data:

Original solution:

%%timeit
nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
# 100 loops, best of 3: 12 ms per loop

Option one:

%%timeit
(df.set_index('ptype')
 .groupby(level='ptype')
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))
# 100 loops, best of 3: 10.1 ms per loop

Option two:

%%timeit 
(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))
# 100 loops, best of 3: 4.3 ms per loop

Definitely works, but it seems to be slower than my solution for a dataframe with ~80K rows. — Grr, May 25 '17 at 13:21
@Grr You might try the second option if performance is an issue, which seems to be faster as it avoids the loop (double `apply` method). — Psidom, May 25 '17 at 17:52
So in finally coming back around to this one I found I occasionally have data that will have unique values `[1,2]` for `nj` instead of `[1,2,3]` in this case I feel that option 1 is more informative as it includes the column for value 3 filled with zeros. All in all the time is only slightly longer than my original method, but as I said I feel it includes more information. Thanks! — Grr, Jul 10 '17 at 18:34

score 1 · Answer 2 · answered May 25 '17 at 13:34

Another solution using groupby and unstack.

df2 = pd.concat([df.groupby(['ptype',e])[e].count().unstack() for e in ['nj','wd','wpt']],axis=1).fillna(0).astype(int)    
df2.columns=pd.MultiIndex.from_product([['nj','wd','wpt'],[1.0,2.0,3.0]])

df2
Out[207]: 
       nj          wd         wpt        
      1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0
ptype                                    
1       1   1   1   0   2   1   2   1   0
2       0   1   1   1   0   1   0   1   1

score -2 · Answer 3 · answered Nov 19 '19 at 16:48

-2

An easier solution is

employee.pivot_table(index= ‘Title’, values= “Salary”, aggfunc= [np.mean, np.median, min, max, np.std], fill_value=0)

In this case, for the salary column we are using different aggregate functions

answered Nov 19 '19 at 16:48

Himanshu Aggarwal

163
1
1
5

Pandas pivot table for multiple columns at once

3 Answers3

Linked

Related