0

H20 says in the documentation that splitting on a feature for regression gbms is based on the reduction in squared error.

Is this squared error based on the node residuals, i.e., (resid - mean resid)^2 or is it the true response, i.e., (response - mean response)? I'm using gamma/ Poisson distributions.

In the case of gamma/Poisson, the loss is the deviance so why is the squared error used instead?

Shayan Shafiq
  • 1,447
  • 5
  • 18
  • 25
Mwiza
  • 1

1 Answers1

0

In H2O the mean squared error is calculated from the node residuals. The goal there is to have an unbiased estimator, so the SE to be minimized is calculated as:

SE = MSE * N = Var * N = wyy - (w * y)^2/N

where y means node residuals not response, w means weights and N means the number of observations. You can read more about H2O GBM tree learning in this booklet: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/GBMBooklet.pdf in chapter Theory and Framework.

Maurever
  • 148
  • 1
  • 7