A question about the parallelism in h2o.grid() function

Question

I try to use the h2o.grid() function from the h2o package to do some tuning using R, when I set the parameter parallelism larger then 1, it always shows the warning

Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)

And the model_ids in the final grid object includes many models end with _cv_1, _cv_2 etc, and the number of the models is not equal to the setting of my max_models in search_criteria list, I think they are just the models in the cv process, not the final model.

When I set parallelism larger than 1: when I set "parallelism" larger than 1

When I leave the parallelism default or set it to 1, the result is normal, all models end with _model_1, _model_2 etc.

When I leave the "parallelism" default or set it to 1:

Here is my code:

# set the grid
rf_h2o_grid <- list(mtries = seq(3, ncol(train_h2o), 4),
                    max_depth = c(5, 10, 15, 20))

# set the search_criteria
sc <- list(strategy = "RandomDiscrete", 
           seed = 100,
           max_models = 5
           )

# random grid tuning
rf_h2o_grid_tune_random <- h2o.grid(
  algorithm = "randomForest", 
  x = x, 
  y = y,
  training_frame = train_h2o,
  nfolds = 5,                     # use cv to validate the parameters
  fold_assignment = "Stratified",   
  ntrees = 100,
  seed = 100,
  hyper_params = rf_h2o_grid,
  search_criteria = sc
  # parallelism = 6           # when I set it larger than 1, the result always includes some "cv_" models
  )

So how can I use the parallelism correctly in h2o.grid()? Thanks for helping!

Seb · Accepted Answer · 2020-11-09T15:54:05.180

This is an actual issue with parallelism in grid search, previously noticed but not reported correctly. Thanks for raising this, we'll try to fix it soon: see https://h2oai.atlassian.net/browse/PUBDEV-7886 if you want to track progress.

Until proper fix, you must avoid using both CV and parallelism in your grids.

Regarding the following error:

Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)

if the error is reproducible, you should be getting more details by running the grid with verbose=True. Adding the entire error message to the ticket above might also help.

Thanks a lot! It is helpful! I will avoid using both cv and parallelism in `h2o.grid()` for a while. — Kim.L, Nov 10 '20 at 02:03

score 0 · Answer 2 · answered Feb 06 '21 at 04:08

This is because you set max_models = 5, your grid will only make 5 models then stop.

There are three ways to set up early stopping criteria:

"max_models": max number of models created
"max_runtime_secs": max running time in seconds
metric-based early stopping by setting up for "stopping_rounds", "stopping_metric", and "stopping_tolerance"

A question about the parallelism in h2o.grid() function

2 Answers2