Stacking list of lists vertically using np.vstack is throwing an error

Question

I am following this piece of code http://queirozf.com/entries/scikit-learn-pipeline-examples in order to develop a Multilabel OnevsRest classifier for text. I would like to compute the hamming_score and thus would need to binarize my test labels as well. I thus have:

        X_train, X_test, labels_train, labels_test = train_test_split(meetings, labels, test_size=0.4)

Here, labels_train and labels_test are list of lists

    [['dog', 'cat'], ['cat'], ['people'], ['nice', 'people']]

Now I need to binarize all my labels, I am therefore doing this...

     all_labels = np.vstack([labels_train, labels_test])
     mlb = MultiLabelBinarizer().fit(all_labels)

As directed by in the link. But that throws

    ValueError: all the input array dimensions except for the concatenation axis must match exactly

I used np.column_stack as directed here

numpy array concatenate: "ValueError: all the input arrays must have same number of dimensions"

but that throws the same error.

How can the dimensions be the same if I am splitting on train and test, I am bound to get different shapes right? Please help, thank you.

When using functions like `vstack` and `column_stack` make sure you know the `shape` of the component arrays - or arrays that will be produced with `np.array(....)`. Don't throw variables together and hope they work. — hpaulj, May 14 '18 at 23:27
That `dog/cat` list has 4 items, some are 2 long, some 1 long. That does not look good for `stacking`. What do you want to produce? — hpaulj, May 14 '18 at 23:29

score 0 · Answer 1 · answered May 15 '18 at 04:27

0

MultilabelBinarizer works on list of lists directly, so you dont need to stack them using numpy. Directly send the list without stacking.

all_labels = labels_train + labels_test
mlb = MultiLabelBinarizer().fit(all_labels)

answered May 15 '18 at 04:27

Vivek Kumar

35,217
8
109
132

Stacking list of lists vertically using np.vstack is throwing an error

1 Answers1