2

How does the RandomForestClassifier of sklearn handle a multilabel problem (under the hood)?

For example, does it brake the problem in distinct one-label problems?

Just to be clear, I have not really tested it yet but I see y : array-like, shape = [n_samples] or [n_samples, n_outputs] at the .fit() function of the RandomForestClassifier.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Outcast
  • 4,967
  • 5
  • 44
  • 99
  • I am interested to know how it works, I am currently dealing with a similar problem. – Vishwas Jul 22 '19 at 14:26
  • @Vishwas cool :) – Outcast Jul 22 '19 at 14:29
  • Probably duplicate of https://datascience.stackexchange.com/questions/30208/problem-to-classify-multilabel-dataset-while-using-random-forest-algorithm . Rf classifier does not provide multilabel problem – s3nh Jul 22 '19 at 14:30

2 Answers2

3

Let me cite scikit-learn. The user guide of random forest:

Like decision trees, forests of trees also extend to multi-output problems (if Y is an array of size [n_samples, n_outputs]).

The section multi-output problems of the user guide of decision trees:

… to support multi-output problems. This requires the following changes:

  • Store n output values in leaves, instead of 1;
  • Use splitting criteria that compute the average reduction across all n outputs.

And I hope this will answer your question. If not, you can look at the section's reference:

Alexandre Huat
  • 806
  • 10
  • 16
0

I was a bit confused when I started using trees. If you refer to the sklearn doc:

https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier

If you go down on the methods to predict_proba, you can see: "The predicted class probability is the fraction of samples of the same class in a leaf."

So in predict, the class is the mode of the classes on that node. This can change if you use weighted classes

"class_weight : dict, list of dicts, “balanced” or None, default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one."

Hope this helps! :)

Alfonso
  • 103
  • 1
  • 7
  • Thank you Alfonso, but curious if you have or know some examples using the class_weight. – Vishwas Jul 22 '19 at 15:28
  • @Vishwas Nope sorry I haven't. I assume this is for unbalanced classes. But have never use then, I rather go with other unbalanced classes method instead as this class_weight makes no logical sense to me. – Alfonso Jul 26 '19 at 13:29
  • How do you deal with the unbalanced classes in a classifier? I need to classify 50 different kinds of defects, however, the defect class is not balanced. There are two or three classes which are predominant in my data set and others are few in number. – Vishwas Jul 26 '19 at 13:46