I'm working with Multi-label probelm and I started to use sklearn
which offers very nice out-of-the-box methods to handle multi-label. I was using MultiOutputClassifier
with RandomForestClassifier
as an estimator. Example with 4 classes:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import MultiOutputClassifier
# The data from your screenshot
# A B C D E F G
x = np.array([
[133.5, 27, 284, 638, 31, 220],
[111.9, 27, 285, 702, 36, 230],
[99.3, 25, 310, 713, 39, 227],
[102.5, 25, 311, 670, 34, 218]
])
y = np.array([[1, 0, 0, 1],
[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 1]])
forest = RandomForestClassifier(n_estimators=100)
classifier = MultiOutputClassifier(forest)
classifier.fit(x, y)
This code produce one classifier for each label (in this case we will end up with 4 classifiers). My questions are:
- Is it possible to pass different classifiers for each label (if there's any out-of-the-box implementation for that using
sklearn
) - I tried to apply the
RandomizedSearchCV
directly to theMultiOutputClassifier
, but it seems that only one model is chosen over all the parameters, instead of chosing one best model for each label. What is the motivation for that? The same model parameters are used for different classifiers? Is it possible to use out-of-the-boxMultiOutputClassifier
andRandomizedSearchCV
to get best models for each label?
I also tried the example from there, but it still return only one final classifier.
Thank you!