Using RandomizedSearchCV in Multi-label classification

Question

I'm working with Multi-label probelm and I started to use sklearn which offers very nice out-of-the-box methods to handle multi-label. I was using MultiOutputClassifier with RandomForestClassifier as an estimator. Example with 4 classes:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import MultiOutputClassifier

# The data from your screenshot
#  A      B   C    D    E   F    G
x = np.array([
  [133.5, 27, 284, 638, 31, 220],
  [111.9, 27, 285, 702, 36, 230],
  [99.3, 25, 310, 713, 39, 227],
  [102.5, 25, 311, 670, 34, 218]
])

y = np.array([[1, 0, 0, 1],
              [0, 1, 0, 0],
              [0, 0, 0, 1],
              [0, 0, 0, 1]])
forest = RandomForestClassifier(n_estimators=100)
classifier = MultiOutputClassifier(forest)
classifier.fit(x, y)

This code produce one classifier for each label (in this case we will end up with 4 classifiers). My questions are:

Is it possible to pass different classifiers for each label (if there's any out-of-the-box implementation for that using sklearn)
I tried to apply the RandomizedSearchCV directly to the MultiOutputClassifier, but it seems that only one model is chosen over all the parameters, instead of chosing one best model for each label. What is the motivation for that? The same model parameters are used for different classifiers? Is it possible to use out-of-the-box MultiOutputClassifier and RandomizedSearchCV to get best models for each label?

I also tried the example from there, but it still return only one final classifier.

Thank you!

score 2 · Accepted Answer · answered Jul 01 '20 at 08:30

The things you are trying to achieve are beyond the purpose of the sklearn.multioutput module. In its documentation it says:

The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. The meta-estimator extends single output estimators to multioutput estimators.

Here, the last sentence is the relevant one to answer your questions. scikit-learn has estimators that support multilabel problems out-of-the-box such as the KNeighborsClassifier (reference). In this case, you would also get only one estimator to predict >1 label.

The purpose of the sklearn.multioutput module is now to extend those estimators that do not support multilabel tasks, such as SVC, so that they provide the same functionality. It is not meant to provide several estimators with different hyperparameters. This is why you cannot use these meta-estimators for what you want to accomplish.

Addressing your questions specifically:

No, this is (at least currently) not possible with this or any other module in scikit-learn.
Again, no as this is against the purpose of the sklearn.multioutput module. Providing such a feature in scikit-learn also would not add much convenience. If you want to have different estimators with different hyperparameters for each label, then you have to do it separately. There is no other way, and any meta-estimator would have to do the same. This is probably why this functionality is not provided.

Using RandomizedSearchCV in Multi-label classification

1 Answers1