How does h2o's distributed random forest handle multi-class problems?

Asked Aug 31 '23 at 14:43

Active Aug 31 '23 at 14:43

Viewed 12 times

The documentation for distributed random forest in h2o states that, for multiclass problems, "a tree is used to estimate the probability of each class separately". I can see this in visualising the trees that each class indeed seems to have a completely independent "one-vs-rest" tree.

I am wondering how the scores from these trees are combined into the final score vector - are they just normalized to sum to one?

I would also like to understand why this approach was chosen and how it compares to the usual approach of handling multiple classes within a single tree. For individual classes we see that the performance of the multiclass classifier is typically worse than a dedicated one-vs-rest classifier with the same hyperparameters, even though under the hood the multiclass classifier should be very similar.

asked Aug 31 '23 at 14:43

nickc

How does h2o's distributed random forest handle multi-class problems?

0 Answers0