How to 'pivot' a dataframe's values into columns

Question

I'm trying to transform a dataframe

df = pd.DataFrame({
'c1': ['x','y','z'],
'c2': [[1,2,3],[1,3],[2,4]]})

which looks like

    c1  c2
0   x   [1, 2, 3]
1   y   [1, 3]
2   z   [2, 4]

into

p = pd.DataFrame({
    'c1': ['x','y','z'],
    1: [1,1,0],
    2: [1,0,1],
    3: [1,1,0],
    4: [0,0,1]
})

which looks like

    c1  1   2   3   4
0   x   1   1   1   0
1   y   1   0   1   0
2   z   0   1   0   1

the value 1's and 0's are supposed to be true and false. I'm still learning pivots. Please point me in the right direction.

jezrael · Accepted Answer · 2018-01-09T11:51:10.227

1

You can use:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

df1 = pd.DataFrame(mlb.fit_transform(df['c2']),columns=mlb.classes_, index=df.index)

df = df.drop('c2', 1).join(df1)
print (df)

  c1  1  2  3  4
0  x  1  1  1  0
1  y  1  0  1  0
2  z  0  1  0  1

Another solution:

df1 = df['c2'].apply(lambda x: '|'.join([str(y) for y in x])).str.get_dummies()

df = df.drop('c2', 1).join(df1)
print (df)
  c1  1  2  3  4
0  x  1  1  1  0
1  y  1  0  1  0
2  z  0  1  0  1

EDIT:

Thanks, MaxU for nice suggestion:

df = df.join(pd.DataFrame(mlb.fit_transform(df.pop('c2')),
                          columns‌=mlb.classes_, 
                          index=df.index))

edited Jan 09 '18 at 11:51

answered Jan 09 '18 at 11:15

jezrael

822,522
95
1,334
1,252

yeah, i think MultiLabelBinarizer is the most idiomatic approach in this case. We could do it in one line: `df = df.join(pd.DataFrame(mlb.fit_transform(df.pop('c2')),columns=mlb.classes_, index=df.index))` – MaxU - stand with Ukraine Jan 09 '18 at 11:49
@MaxU - Thank you. – jezrael Jan 09 '18 at 11:51

score 0 · Answer 2 · answered Jan 09 '18 at 11:24

You can use

In [235]: df.join(pd.DataFrame([{x: 1 for x in r} for r in df.c2]).fillna(0))
Out[235]:
  c1         c2    1    2    3    4
0  x  [1, 2, 3]  1.0  1.0  1.0  0.0
1  y     [1, 3]  1.0  0.0  1.0  0.0
2  z     [2, 4]  0.0  1.0  0.0  1.0

Details

In [236]: pd.DataFrame([{x: 1 for x in r} for r in df.c2]).fillna(0)
Out[236]:
     1    2    3    4
0  1.0  1.0  1.0  0.0
1  1.0  0.0  1.0  0.0
2  0.0  1.0  0.0  1.0

How to 'pivot' a dataframe's values into columns

2 Answers2