MultiLabelBinarizer: inverse_transform fails on a single sample?

Question

I want to apply the inverse_transformof MultiLabelBinarizer to a single sample, e.g.:

labels = ['Architektur & Garten',
          'Ganzheitliches Bewusstsein',
          'Glaube & Ethik',
          'Kinderbuch & Jugendbuch',
          'Künste',
          'Literatur & Unterhaltung',
          'Ratgeber',
          'Sachbuch']

samples = []
for l in labels:
   samples.append([l])

from sklearn.preprocessing import MultiLabelBinarizer
m = MultiLabelBinarizer()
m.fit_transform(samples)

If I now apply the MultiLabelBinarizer to a matrix it works:

s = np.array([[0, 1, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0, 1]])
m.inverse_transform(s)
[('Ganzheitliches Bewusstsein',), ('Sachbuch',)]

If I however try to apply it to a single sample, i.e. a vector it fails:

import numpy as np
s = np.array([0, 1, 0, 0, 0, 0, 0, 0])
m.inverse_transform(s)

--> 957         if yt.shape[1] != len(self.classes_):
    958             raise ValueError('Expected indicator for {0} classes, but got {1}'
    959                              .format(len(self.classes_), yt.shape[1]))

looking through the source code it seems inverse_transform always expect a vector of two dimensions. — David Batista, Apr 07 '19 at 00:12

score 2 · Answer 1 · answered Jun 26 '19 at 18:10

Judging from your comment it looks like you've solved it. In case this helps someone else, some more detail:

In the first example, if we print the dimensions of s we get (2, 8):

>>> s = np.array([[0, 1, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0, 1]])
>>> s.shape
(2,8)

In the second example, if we do the same thing, we get (8,):

>>> s = np.array([0, 1, 0, 0, 0, 0, 0, 0])
>>> s.shape
(8,)

The issue is with the second example and the error inverse_transform throws is helpful by showing that it expects yt.shape[1] to be available. In the second example that dimension is not available, hence the error.

This can be fixed two ways:

Just add square brackets: s = np.array([0, 1, 0, 0, 0, 0, 0, 0]) becomes s = np.array([[0, 1, 0, 0, 0, 0, 0, 0]])
Reshape: s = np.reshape(s, (1, s.shape[0]))

Either way, the output of s.shape after will be (1, 8) and m.inverse_transform(s) will work as expected.

MultiLabelBinarizer: inverse_transform fails on a single sample?

1 Answers1