1

I have the following dataframe

import pandas as pd
df = pd.DataFrame({'id':[1,2,3,4,5,6], 'id_2':[6,5,4,3,2,1],
'col_1':['A','A','A','B','B','B'],
'col_2':['X','Z','X','Z','X','Z'],
'value':[10,20,30,40,50,60]})

And I want to dcast it, so I use

df= df.pivot_table(index=['id','id_2'], columns=['col_1', 'col_2'],aggfunc=lambda x: x)

I do not know how droplevel and change the df.columns into A_X,A_Z,B_X,B_Z. The multi-index confuses me

Any ideas ?

UPDATE

I would like to end up with

import numpy as np

df=pd.DataFrame({'id':[1,2,3,4,5,6], 'id_2':[6,5,4,3,2,1],
'A_X':[10,np.nan,30,np.nan,np.nan,np.nan],
'A_Z':[np.nan,20,np.nan,np.nan,np.nan,np.nan],
'B_X':[np.nan,np.nan,np.nan,np.nan,50,np.nan],
'B_Z':[np.nan,np.nan,np.nan,40,np.nan,60]})
quant
  • 4,062
  • 5
  • 29
  • 70

2 Answers2

3

You need remove top level value from Multiindex - by Index.droplevel or with list comprehension:

print (df.columns)
MultiIndex(levels=[['value'], ['A', 'B'], ['X', 'Z']],
           codes=[[0, 0, 0, 0], [0, 0, 1, 1], [0, 1, 0, 1]],
           names=[None, 'col_1', 'col_2'])

df.columns = df.columns.droplevel(0).map('_'.join)

Or:

df.columns = [f'{b}_{c}' for a,b,c in df.columns]

df = df.reset_index()
print (df)

   id  id_2   A_X   A_Z   B_X   B_Z
0   1     6  10.0   NaN   NaN   NaN
1   2     5   NaN  20.0   NaN   NaN
2   3     4  30.0   NaN   NaN   NaN
3   4     3   NaN   NaN   NaN  40.0
4   5     2   NaN   NaN  50.0   NaN
5   6     1   NaN   NaN   NaN  60.0

Another solution is specify value parameter in pivot_table:

df= df.pivot_table(index=['id','id_2'], columns=['col_1', 'col_2'], values='value')

print (df.columns)
MultiIndex(levels=[['A', 'B'], ['X', 'Z']],
           codes=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['col_1', 'col_2'])

df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)

   id  id_2   A_X   A_Z   B_X   B_Z
0   1     6  10.0   NaN   NaN   NaN
1   2     5   NaN  20.0   NaN   NaN
2   3     4  30.0   NaN   NaN   NaN
3   4     3   NaN   NaN   NaN  40.0
4   5     2   NaN   NaN  50.0   NaN
5   6     1   NaN   NaN   NaN  60.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @quant - there is first level `value`, need remove it. – jezrael Apr 09 '19 at 09:17
  • I am sorry, I am not sure I understand – quant Apr 09 '19 at 09:18
  • 1
    @quant - ya, problem is there is no parameter `values` in `pivot_table`, so all columns create top level in `MultiIndex` after `pivot_table`. So first 2 solution dealing with it - first join only b,c for avoid first level a or use [`Index.droplevel`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.droplevel.html) for remove top level. But better is second solution - added parameter `values='value'`, then is no `value` first level, so `df.columns = df.columns.map('_'.join)` working nice. – jezrael Apr 09 '19 at 09:21
2
df2 = (df.pivot_table(index=['id','id_2'], columns=['col_1', 'col_2'],
                         values='value')
            .reset_index()
        )

Output:

    id  id_2    A   B
                X   Z   X   Z
0   1   6   10.0    NaN NaN  NaN
1   2   5   NaN    20.0 NaN  NaN
2   3   4   30.0    NaN NaN  NaN
3   4   3   NaN    NaN  NaN  40.0
4   5   2   NaN    NaN  50.0 NaN
5   6   1   NaN    NaN  NaN  60.0
min2bro
  • 4,509
  • 5
  • 29
  • 55