I am following this piece of code http://queirozf.com/entries/scikit-learn-pipeline-examples in order to develop a Multilabel OnevsRest classifier for text. I would like to compute the hamming_score and thus would need to binarize my test labels as well. I thus have:
X_train, X_test, labels_train, labels_test = train_test_split(meetings, labels, test_size=0.4)
Here, labels_train and labels_test are list of lists
[['dog', 'cat'], ['cat'], ['people'], ['nice', 'people']]
Now I need to binarize all my labels, I am therefore doing this...
all_labels = np.vstack([labels_train, labels_test])
mlb = MultiLabelBinarizer().fit(all_labels)
As directed by in the link. But that throws
ValueError: all the input array dimensions except for the concatenation axis must match exactly
I used np.column_stack as directed here
numpy array concatenate: "ValueError: all the input arrays must have same number of dimensions"
but that throws the same error.
How can the dimensions be the same if I am splitting on train and test, I am bound to get different shapes right? Please help, thank you.