2

I have a dataframe like this:

    mid value   label
ID          
192 3   176.6   [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 4   73.6    [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 5   15.8    [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
194 3   9603.2  [0, 0, 0, 0, 0, 9, 6, 1, 8, ...

I want to implement MultiLabelBinarizer after removing the duplicate values in each list of label column.

I have tried by looping the frame and removing duplicates. and also, the multilabel binarizer doesnt work and throws an exception

    from sklearn.preprocessing import MultiLabelBinarizer
    mlb = MultiLabelBinarizer()
    mlb.fit(y_train.data)
    X_train includes the mid and value columns
    y_train includes label values
    id is the index

I expect a prediction from the above values after the duplicate values are removed from each list of label column
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • 1
    This is the format of the dataframe. 192 3 176.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19... 1 192 4 73.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19... 192 5 15.8 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19... 194 3 9603.2 [0, 0, 0, 0, 0, 9, 6, 1, 8, ... – Sample Test Nov 11 '19 at 13:01
  • Possible duplicate of [Transform pandas Data Frame to use for MultiLabelBinarizer](https://stackoverflow.com/questions/53494873/transform-pandas-data-frame-to-use-for-multilabelbinarizer) – PV8 Nov 11 '19 at 14:15

1 Answers1

0

Let's assume your dataframe is named df:

df2 = pd.DataFrame(df.groupby(['ID','mid', 'value'])['label'].apply(lambda x: tuple(x.values)))
df2.reset_index(inplace=True)

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(df2['label'])
mlb.transform(df2['label'])
PV8
  • 5,799
  • 7
  • 43
  • 87