RandomForestClassifier in Multi-label problem - how it works?

Question

How does the RandomForestClassifier of sklearn handle a multilabel problem (under the hood)?

For example, does it brake the problem in distinct one-label problems?

Just to be clear, I have not really tested it yet but I see y : array-like, shape = [n_samples] or [n_samples, n_outputs] at the .fit() function of the RandomForestClassifier.

I am interested to know how it works, I am currently dealing with a similar problem. — Vishwas, Jul 22 '19 at 14:26
Probably duplicate of https://datascience.stackexchange.com/questions/30208/problem-to-classify-multilabel-dataset-while-using-random-forest-algorithm . Rf classifier does not provide multilabel problem — s3nh, Jul 22 '19 at 14:30

Alexandre Huat · Answer 1 · 2019-07-22T14:56:08.187

Let me cite scikit-learn. The user guide of random forest:

Like decision trees, forests of trees also extend to multi-output problems (if Y is an array of size [n_samples, n_outputs]).

The section multi-output problems of the user guide of decision trees:

… to support multi-output problems. This requires the following changes:

Store n output values in leaves, instead of 1;

Use splitting criteria that compute the average reduction across all n outputs.

And I hope this will answer your question. If not, you can look at the section's reference:

M. Dumont et al., Fast multi-class image annotation with random subwindows and multiple output randomized trees, International Conference on Computer Vision Theory and Applications, 2009.

score 0 · Answer 2 · answered Jul 22 '19 at 14:35

0

I was a bit confused when I started using trees. If you refer to the sklearn doc:

https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier

If you go down on the methods to predict_proba, you can see: "The predicted class probability is the fraction of samples of the same class in a leaf."

So in predict, the class is the mode of the classes on that node. This can change if you use weighted classes

"class_weight : dict, list of dicts, “balanced” or None, default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one."

Hope this helps! :)

answered Jul 22 '19 at 14:35

Alfonso

103
1
7

Thank you Alfonso, but curious if you have or know some examples using the class_weight. – Vishwas Jul 22 '19 at 15:28
@Vishwas Nope sorry I haven't. I assume this is for unbalanced classes. But have never use then, I rather go with other unbalanced classes method instead as this class_weight makes no logical sense to me. – Alfonso Jul 26 '19 at 13:29
How do you deal with the unbalanced classes in a classifier? I need to classify 50 different kinds of defects, however, the defect class is not balanced. There are two or three classes which are predominant in my data set and others are few in number. – Vishwas Jul 26 '19 at 13:46

RandomForestClassifier in Multi-label problem - how it works?

2 Answers2