Following my previous question and recommendations addressed in its comments, I was trying to find a proper value for the maxit
argument of the makeTuneControlRandom
function so that when I shrink the lower:upper
interval the optimized hyper parameter does not change. In doing so, I came across a case for which I could not find the reason :
let's suppose that the hyper parameter to tune is max_depth
that has to be an integer. In the first step, I defined the search space as follows :
set.seed(1365)
# define task
Task <- mlr::makeClassifTask(id = "classif.xgboost",
data = df,
target = "response",
weights = NULL,
positive = "yes",
check.data = TRUE,
blocking = folds)
# make a base learner
lrnBase <- makeLearner(cl = "classif.xgboost",
predict.type = "prob",
predict.threshold = NULL)
paramSet <- makeParamSet(makeIntegerParam(id = "max_depth", lower = 3, upper = 10))
and :
tuneControl <- makeTuneControlRandom(maxit = 50)
as you can see the only integer values between 3 and 10 are 3, 4, 5, 6, 7, 8 , 10, indicating in total 8 numbers (< 50).
I run the code :
# make an undersample-wrapped learner
lrnUnder <- makeUndersampleWrapper(learner = lrnBase, usw.rate = 0.2, usw.cl = "no")
tuneControl <- makeTuneControlRandom(maxit = 50)
# resampling
resampin <- makeResampleDesc(method = "CV",
iters = 4L,
predict = "test")
# make a tuning-wrapped learner
lrnTune <- makeTuneWrapper(learner = lrnUnder,
resampling = resampin,
measures = fp,
par.set = paramSet,
control = tuneControl)
resampout.desc <- makeResampleDesc(method = "CV",
iters = length(levels(folds)),
predict = "both",
fixed = TRUE)
resampout <- makeResampleInstance(desc = resampout.desc, task = Task)
resamp <- mlr::resample(learner = lrnTune,
task = Task,
resampling = resampout, # outer
measures = f1,
models = FALSE,
extract = getTuneResult,
keep.pred = TRUE)
mdl <- mlr::train(learner = lrnTune, task = Task)
getTuneResult(mdl)
The tuned max_depth
was returned as 7 with a specific confuison matrix (fp=20, fn=20). I expected that if I increased the value of the maxit
argument, the tuning algorithm should still find the same optimum max_depth
. So I set maxit
to 100 and surprisingly it returned max_depth
= 4 and the corresponding confusion matrix was also different (fp=33, fn=22). Why I cannot re-find the same optimum value ? Is this due to the including undersampling process that randomly reduces one of my classes so the remaining observations change at every run ? If so, it seems that I can never find one single tuned model. What are my possible solutions to overcome this ? Thanks a lot in advance.