1

What do h2o checkpoints actually do? Does a model created with say

gbm_continued = H2OGradientBoostingEstimator(checkpoint= gbm_orig.model_id, ntrees = 50, seed = 1234)

mean that gbm_continued will have the same parameters and prediction performance as gbm_orig if we were to not train it on any new data?

The docs, say "This will build a new model as a continuation of a previously generated model", but I am confused as to what a "continuation" actually implies. An explanation would be much appreciated. Thanks

Darren Cook
  • 27,837
  • 13
  • 117
  • 217
lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102

1 Answers1

2

The key parameter is ntrees (epochs for a deep learning model). I will quote my own book (Practical Machine Learning with H2O, p.103):

When specifying epochs, or the number of trees, specify the total amount of training you want if you had started from scratch, not how many additional epochs or trees you want.

So, in your case, if your original model was made with 50 trees, your new model will effectively do nothing more than duplicating the existing model. But if your original model was made with ntrees = 20 and your new model uses that as a checkpoint but with ntrees = 50 then it will add 30 more trees to the model.

Some parameters must stay the same, but some can be altered. E.g. you might lower the learning rate.

Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • Does this mean that if you wanted to created a new model using a checkpoint of another model solely to train on more data (to further refine the older model's weightings), you would just do something like `gbm_continued = H2OGradientBoostingEstimator(checkpoint= gbm_orig.model_id, ...)` then `gbm_continued.train(x=extra_data, ...)`? Asking because that is ultimately what I'm trying to do with checkpoints and [other](https://stackoverflow.com/q/44341557/8236733) posts seems to indicate that it is not possible. – lampShadesDrifter Dec 14 '17 at 18:31
  • Yes, that should work. The linked-to answer is about restrictions with cross-validation. (Seems a bit strict, but I've never tried CV and checkpoints together, so I am not going to argue!) – Darren Cook Dec 14 '17 at 18:53