0

Due to the fact that I'm currently working on a highly unbalanced multi-class classification problem, I'm considering balanced random forests (https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf). Do you have some experience implementing balanced random forests using H2O? If so, could you please elaborate on the following question:

Is it even possible to change the default process of creating bootstrap samples within H2O to come up with balanced sub-samples (for each iteration in the random forest, draw a bootstrap sample from the minority class. Randomly draw the same number of cases, with replacement, from the majority classes) of the original data set for each tree to grow?

Flo
  • 1
  • 1

1 Answers1

0

H2O's random forest doesn't perform bootstrapping, instead it samples at a rate of 63.2% (which is the expected value of unique rows in any bootstrapped sample).

If you want to get a balanced sample, you can use can use the parameter balance_classes with class_sampling_factors, or weights_column

Lauren
  • 5,640
  • 1
  • 13
  • 19