I am trying to run optimizing grid for 2 algorithms (random forest
and gbm
) for different parts of a data set, using h2o
. My code looks like
for (...)
{
read data
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.shutdown(prompt = FALSE)
}
The problem is that if i run the for loop
in one go, i get the error
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix, :
Unexpected CURL error: Failed to connect to localhost port 54321: Connection refused
P.S.: I am using the line
# shutdown h2o
h2o.shutdown(prompt = FALSE)
# setup h2o cluster
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
So that I "reset" the h2o
, so that i do not run out of memory
I also read R H2O - Memory management but it is not clear to me how it works.
UPDATE
After following Matteusz comment, i init
outside the for loop
and inside of the for loop
i use h2o.removeAll()
. So now my code looks like this
h2o <- h2o.init(ip = "localhost", port = 54321, nthreads = detectCores()-1)
for(...)
{
read data
gbm.grid <- h2o.grid("gbm", grid_id = "gbm.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll()
rf.grid <- h2o.grid("randomForest", grid_id = "rf.grid", x = names(td.train.h2o)[!names(td.train.h2o)%like%segment_binary], y = segment_binary,
seed = 42, distribution = "bernoulli",
training_frame = td.train.h2o, validation_frame = td.train.hyper.h2o,
hyper_params = hyper_params, search_criteria = search_criteria)
h2o.removeAll() }
It seems to work, but now i get this error (?) in the grid optimization
for random forest
Any ideas what this might be ?