I try to use the h2o.grid()
function from the h2o package to do some tuning using R, when I set the parameter parallelism
larger then 1, it always shows the warning
Some models were not built due to a failure, for more details run `summary(grid_object, show_stack_traces = TRUE)
And the model_ids in the final grid object includes many models end with _cv_1
, _cv_2
etc, and the number of the models is not equal to the setting of my max_models
in search_criteria
list, I think they are just the models in the cv
process, not the final model.
When I set parallelism
larger than 1:
When I leave the parallelism
default or set it to 1, the result is normal, all models end with _model_1
, _model_2
etc.
When I leave the "parallelism" default or set it to 1:
Here is my code:
# set the grid
rf_h2o_grid <- list(mtries = seq(3, ncol(train_h2o), 4),
max_depth = c(5, 10, 15, 20))
# set the search_criteria
sc <- list(strategy = "RandomDiscrete",
seed = 100,
max_models = 5
)
# random grid tuning
rf_h2o_grid_tune_random <- h2o.grid(
algorithm = "randomForest",
x = x,
y = y,
training_frame = train_h2o,
nfolds = 5, # use cv to validate the parameters
fold_assignment = "Stratified",
ntrees = 100,
seed = 100,
hyper_params = rf_h2o_grid,
search_criteria = sc
# parallelism = 6 # when I set it larger than 1, the result always includes some "cv_" models
)
So how can I use the parallelism
correctly in h2o.grid()
? Thanks for helping!