1

I'm trying to transform a dataframe

df = pd.DataFrame({
'c1': ['x','y','z'],
'c2': [[1,2,3],[1,3],[2,4]]})

which looks like

    c1  c2
0   x   [1, 2, 3]
1   y   [1, 3]
2   z   [2, 4]

into

p = pd.DataFrame({
    'c1': ['x','y','z'],
    1: [1,1,0],
    2: [1,0,1],
    3: [1,1,0],
    4: [0,0,1]
})

which looks like

    c1  1   2   3   4
0   x   1   1   1   0
1   y   1   0   1   0
2   z   0   1   0   1

the value 1's and 0's are supposed to be true and false. I'm still learning pivots. Please point me in the right direction.

2 Answers2

1

You can use:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

df1 = pd.DataFrame(mlb.fit_transform(df['c2']),columns=mlb.classes_, index=df.index)

df = df.drop('c2', 1).join(df1)
print (df)

  c1  1  2  3  4
0  x  1  1  1  0
1  y  1  0  1  0
2  z  0  1  0  1

Another solution:

df1 = df['c2'].apply(lambda x: '|'.join([str(y) for y in x])).str.get_dummies()

df = df.drop('c2', 1).join(df1)
print (df)
  c1  1  2  3  4
0  x  1  1  1  0
1  y  1  0  1  0
2  z  0  1  0  1

EDIT:

Thanks, MaxU for nice suggestion:

df = df.join(pd.DataFrame(mlb.fit_transform(df.pop('c2')),
                          columns‌​=mlb.classes_, 
                          index=df.index))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can use

In [235]: df.join(pd.DataFrame([{x: 1 for x in r} for r in df.c2]).fillna(0))
Out[235]:
  c1         c2    1    2    3    4
0  x  [1, 2, 3]  1.0  1.0  1.0  0.0
1  y     [1, 3]  1.0  0.0  1.0  0.0
2  z     [2, 4]  0.0  1.0  0.0  1.0

Details

In [236]: pd.DataFrame([{x: 1 for x in r} for r in df.c2]).fillna(0)
Out[236]:
     1    2    3    4
0  1.0  1.0  1.0  0.0
1  1.0  0.0  1.0  0.0
2  0.0  1.0  0.0  1.0
Zero
  • 74,117
  • 18
  • 147
  • 154