3

There is a dataframe:

    0   1   2   3
0   a   c   e   NaN
1   b   d   NaN NaN
2   b   c   NaN NaN
3   a   b   c   d
4   a   b   NaN NaN
5   b   c   NaN NaN
6   a   b   NaN NaN
7   a   b   c   e
8   a   b   c   NaN
9   a   c   e   NaN

I would like to transfrom encode it with one-hot like this

    a   c   e   b   d
0   1   1   1   0   0
1   0   0   0   1   1
2   0   1   0   1   0
3   1   1   0   1   1
4   1   0   0   1   0
5   0   1   0   1   0
6   1   0   0   1   0
7   1   1   1   1   0
8   1   1   0   1   0
9   1   1   1   0   0

pd.get_dummies does not work here, because it acutually encode each columns independently. How can I get this? Btw, the order of the columns doesn't matter.

xiaoluohao
  • 265
  • 2
  • 11

2 Answers2

4

Try this:

df.stack().str.get_dummies().max(level=0)

Out[129]:
   a  b  c  d  e
0  1  0  1  0  1
1  0  1  0  1  0
2  0  1  1  0  0
3  1  1  1  1  0
4  1  1  0  0  0
5  0  1  1  0  0
6  1  1  0  0  0
7  1  1  1  0  1
8  1  1  1  0  0
9  1  0  1  0  1
Andy L.
  • 24,909
  • 4
  • 17
  • 29
1

One way using str.join and str.get_dummies:

one_hot = df1.apply(lambda x: "|".join([i for i in x if pd.notna(i)]), 1).str.get_dummies()
print(one_hot)

Output:

   a  b  c  d  e
0  1  0  1  0  1
1  0  1  0  1  0
2  0  1  1  0  0
3  1  1  1  1  0
4  1  1  0  0  0
5  0  1  1  0  0
6  1  1  0  0  0
7  1  1  1  0  1
8  1  1  1  0  0
9  1  0  1  0  1
Chris
  • 29,127
  • 3
  • 28
  • 51