3

Given a list of list, in which each sublist is a bucket filled with letters, like:

L=[['a','c'],['b','e'],['d']]

I would like to encode each sublist as one row in my DataFrame like this:

    a   b   c   d   e
0   1   0   1   0   0
1   0   1   0   0   1
2   0   0   0   1   0

Let's assume the letter is just from 'a' to 'e'. I am wondering how to complete a function to do so.

jpp
  • 159,742
  • 34
  • 281
  • 339
Garvey
  • 1,197
  • 3
  • 13
  • 26

1 Answers1

4

You can use the sklearn library:

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

L = [['a', 'c'], ['b', 'e'], ['d']]

mlb = MultiLabelBinarizer()

res = pd.DataFrame(mlb.fit_transform(L),
                   columns=mlb.classes_)

print(res)

   a  b  c  d  e
0  1  0  1  0  0
1  0  1  0  0  1
2  0  0  0  1  0
jpp
  • 159,742
  • 34
  • 281
  • 339