1

Let's say I have the following samples with their respective multi-label

Where X1,X2,X3,X4,X5,X6 are samples

and Y1,Y2,Y3,Y4 are labels

X1 : {Y2, Y3}
x2 : {Y1}
X3 : {Y2}
X4 : {Y2, Y3}
X5 : {Y1, Y2, Y3, Y4}
X6 : {Y2}

How do I transform to

X1 : y1
x2 : y2
X3 : y3
X4 : y1
X5 : y4
X6 : y3

What I understood is that this approach is how the transformation happens in the Label Powerset method. But, I do not want to classify using this method. I just wanted to convert the labels.

We gave MultiLabelBinarizer to convert the multi-label to two-class. But this one only creates 0 and 1.

Jaya A
  • 145
  • 1
  • 8
  • Can you give a detail of what are the objects y1 and Y1, what types ? – Benjamin Breton May 24 '22 at 12:34
  • ``X1,X2,X3,X4,X5,X6`` are samples and ``Y1,Y2,Y3,Y4`` are labels – Jaya A May 24 '22 at 14:34
  • what's the motivation behind the transformation. Are you just assigning the labels randomly? – Olasimbo May 24 '22 at 14:47
  • @OlasimboArigbabu The purpose is to transform multi-label classification problems into multi-class classification problems. This is similar to the Label Powerset method. But, I just wanted to convert the labels. – Jaya A May 25 '22 at 00:44

1 Answers1

0

If you just want to map sequences of labels to a new label, you could convert those sequences to their string representation and use the LabelEncoder from sklearn.

from sklearn import preprocessing

Y = [(1, 2), (1, 2, 3, 4), (1,)]

le = preprocessing.LabelEncoder()
le.fit([str(y) for y in Y])

le.transform([str((1,)), str((1, 2))])
>>> array([2, 0])

Do be wary though, any invalid sequence in your test set won't be supported by your label encoder. This suggestion assumes labels are ordered in their representation and non-repeating.

amiasato
  • 914
  • 6
  • 14
  • Can this method make the same multi-label be the same value? For example, let's say X1 with label {y1,y2} and X3 with label {y1, y2} too. Both of them have the same label let's say label Z1. Is this possible? – Jaya A May 25 '22 at 00:49