I am trying to parallelize at the tuning hyperparameter level an xgboost
model that I am tuning in mlr
and am trying to parallelize with parallelMap
. I have code that works successfully on my windows machine (with only 8 cores) and would like to make use of a linux server (with 72 cores). I have not been able to successfully gain any computational advantage moving to the server, and I think this is a result of holes in my understanding of the parallelMap parameters.
I do not understand the differences in multicore vs local vs socket as "modes" in parallelMap. Based on my reading, I think that multicore would work for my situation, but I am not sure. I used socket successfully on my windows machine and have tried both socket and multicore on my linux server, with unsuccessful results.
parallelStart(mode="socket", cpu=8, level="mlr.tuneParams")
but it is my understanding that socket might be unnecessary or perhaps slow for parallelizing over many cores that do not need to communicate with each other, as is the case with parallelizing hyperparameter tuning.
To elaborate on my unsuccessful results on my linux server: I am not getting errors, but things that would take <24 hours in serial are taking > 2 weeks in parallel. Looking at the processes, I can see that I am indeed using several cores.
Each individual call xgboost runs in the matter of a few minutes, and I am not trying to speed that up. I am only trying to tune hyperparmeters over several cores.
I was concerned that perhaps my very slow results on my linux server were due to attempts by xgboost to make use of the available cores in model building, so I fed nthread = 1
to xgboost via mlr to ensure that does not happen. Nonetheless, my code seems to run much slower on my larger linux server than it does on my smaller windows computer -- any thoughts as to what might be happening?
Thanks so very much.
xgb_learner_tune <- makeLearner(
"classif.xgboost",
predict.type = "response",
par.vals = list(
objective = "binary:logistic",
eval_metric = "map",
nthread=1))
library(parallelMap)
parallelStart(mode="multicore", cpu=8, level="mlr.tuneParams")
tuned_params_trim <- tuneParams(
learner = xgb_learner_tune,
task = trainTask,
resampling = resample_desc,
par.set = xgb_params,
control = control,
measures = list(ppv, tpr, tnr, mmce)
)
parallelStop()
Edit
I am still surprised by my lack of performance improvement attempting to parallelize at the tuning level. Are my expectations unfair? I am getting substantially slower performance with parallelMap
than tuning in serial for the below process:
numeric_ps = makeParamSet(
makeNumericParam("C", lower = 0.5, upper = 2.0),
makeNumericParam("sigma", lower = 0.5, upper = 2.0)
)
ctrl = makeTuneControlRandom(maxit=1024L)
rdesc = makeResampleDesc("CV", iters = 3L)
#In serial
start.time.serial <- Sys.time()
res.serial = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
par.set = numeric_ps, control = ctrl)
stop.time.serial <- Sys.time()
stop.time.serial - start.time.serial
#In parallel with 2 CPUs
start.time.parallel.2 <- Sys.time()
parallelStart(mode="multicore", cpu=2, level="mlr.tuneParams")
res.parallel.2 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
par.set = numeric_ps, control = ctrl)
parallelStop()
stop.time.parallel.2 <- Sys.time()
stop.time.parallel.2 - start.time.parallel.2
#In parallel with 16 CPUs
start.time.parallel.16 <- Sys.time()
parallelStart(mode="multicore", cpu=16, level="mlr.tuneParams")
res.parallel.16 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
par.set = numeric_ps, control = ctrl)
parallelStop()
stop.time.parallel.16 <- Sys.time()
stop.time.parallel.16 - start.time.parallel.16
My console output is (tuning details omitted):
> stop.time.serial - start.time.serial
Time difference of 33.0646 secs
> stop.time.parallel - start.time.parallel
Time difference of 2.49616 mins
> stop.time.parallel.16 - start.time.parallel.16
Time difference of 2.533662 mins
I would have expected things to be faster in parallel. Is that unreasonable for this example? If so, when should I expect performance improvements in parallel?
Looking at the terminal, I do seem to be using 2 (and 16) threads/processes (apologies if my terminology is incorrect).
Thanks so much for any further input.