0

I have a dataset with features and their labels.

it looks like this:

X1, X2, X3, X4, X5 .. Xn L1, L2, L3
Y1, Y2, Y3, Y4, Y5 .. Yn L5, L2
..

I want to train a KNeighborsClassifier on this dataset. It seems like sklearn does not take multilabels. I have been trying this:

mlb = MultiLabelBinarizer()
Y = mlb.fit_transform(Y)

# parameters:  n_neighbors=[5,15], weights = 'uniform', 'distance'
bagging = BaggingClassifier(KNeighborsClassifier(n_neighbors =5,weights ='uniform'), max_samples = 0.6, max_features= 0.7, verbose =1, oob_score =True)
scores = cross_val_score(bagging, X, Y, verbose =1, cv=3, n_jobs=3, scoring='f1_macro')

It is giving me ValueError: bad input shape

Is there a way that I can run multilabel classifier in sklearn?

pg2455
  • 5,039
  • 14
  • 51
  • 78
  • 1
    `KNeighborsClassifier` does take multi-labels, not `BaggingClassifier` does not. https://github.com/scikit-learn/scikit-learn/issues/4758 – yangjie Aug 17 '15 at 14:45

3 Answers3

2

According to sklearn documentation the classifiers that support multioutput-multiclass classification tasks are:

Decision Trees, Random Forests, Nearest Neighbors

Asia
  • 21
  • 3
2

Since you have a binary matrix for your labels, you can use OneVsRestClassifier to make your BaggingClassifier handle multilabel predictions. Code should now look like:

bagging = BaggingClassifier(KNeighborsClassifier(n_neighbors=5, weights='uniform'), max_samples=0.6, max_features=0.7, verbose=1, oob_score=True)
clf = OneVsRestClassifier(bagging)
scores = cross_val_score(clf, X, Y, verbose=1, cv=3, n_jobs=3, scoring='f1_macro')

You can use the OneVsRestClassifier with any of the sklearn models to do multilabel classification.

Here's an explanation:

http://scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest

And here are the docs:

http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html

hume
  • 2,413
  • 19
  • 21
1

For anybody who finds this looking for multi-label KNN (MLKNN) options, I would recommend using skmultilearn, which is built on top of sklearn, so easy to use if you are familiar with the latter package.

Documentation here. This example is from the documentation:

from skmultilearn.adapt import MLkNN

classifier = MLkNN(k=3)

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)
Jaccar
  • 1,720
  • 17
  • 46