R-MLR : tuning hyper parameters using ' makeTuneControlRandom ' for a wrapped learner

Question

Following my previous question and recommendations addressed in its comments, I was trying to find a proper value for the maxit argument of the makeTuneControlRandom function so that when I shrink the lower:upper interval the optimized hyper parameter does not change. In doing so, I came across a case for which I could not find the reason : let's suppose that the hyper parameter to tune is max_depth that has to be an integer. In the first step, I defined the search space as follows :

set.seed(1365)
# define task
Task <- mlr::makeClassifTask(id = "classif.xgboost",
                             data = df, 
                             target = "response", 
                             weights = NULL, 
                             positive = "yes", 
                             check.data = TRUE,
                             blocking = folds)

# make a base learner
lrnBase <- makeLearner(cl = "classif.xgboost", 
                       predict.type = "prob", 
                       predict.threshold = NULL)

paramSet <- makeParamSet(makeIntegerParam(id = "max_depth", lower = 3, upper = 10))

and :

tuneControl <- makeTuneControlRandom(maxit = 50)

as you can see the only integer values between 3 and 10 are 3, 4, 5, 6, 7, 8 , 10, indicating in total 8 numbers (< 50).

I run the code :

# make an undersample-wrapped learner
lrnUnder <- makeUndersampleWrapper(learner = lrnBase, usw.rate = 0.2, usw.cl = "no")

tuneControl <- makeTuneControlRandom(maxit = 50)

# resampling
resampin <- makeResampleDesc(method = "CV",
                             iters = 4L,
                             predict = "test")
# make a tuning-wrapped learner
lrnTune <- makeTuneWrapper(learner = lrnUnder,
                           resampling = resampin, 
                           measures = fp,
                           par.set = paramSet,
                           control = tuneControl)

resampout.desc <- makeResampleDesc(method = "CV",
                                   iters = length(levels(folds)),
                                   predict = "both",
                                   fixed = TRUE)
resampout <- makeResampleInstance(desc = resampout.desc, task = Task)

resamp <- mlr::resample(learner = lrnTune,
                        task = Task,
                        resampling = resampout, # outer
                        measures = f1, 
                        models = FALSE,
                        extract = getTuneResult,
                        keep.pred = TRUE)

mdl <- mlr::train(learner = lrnTune, task = Task)
getTuneResult(mdl)

The tuned max_depth was returned as 7 with a specific confuison matrix (fp=20, fn=20). I expected that if I increased the value of the maxit argument, the tuning algorithm should still find the same optimum max_depth. So I set maxit to 100 and surprisingly it returned max_depth = 4 and the corresponding confusion matrix was also different (fp=33, fn=22). Why I cannot re-find the same optimum value ? Is this due to the including undersampling process that randomly reduces one of my classes so the remaining observations change at every run ? If so, it seems that I can never find one single tuned model. What are my possible solutions to overcome this ? Thanks a lot in advance.

Can you provide a complete example that allows to reproduce the behavior please, including data? — Lars Kotthoff, Dec 23 '19 at 20:42
According to pat-s's answer, the behavior is fully normal and I did not therefore update my question with my data set. Thank you, nevertheless. — Basilique, Dec 24 '19 at 12:46

score 2 · Accepted Answer · answered Dec 23 '19 at 22:26

Reading your question(s) today, it seems that you are not fully understanding what is happending when you "tune a model", not looking at the tuning method in particular (here: random search). My answer will only explain one specific part but I highly suggest to consult the literature about general statistical learning / machine learning. The Elements of Statistical Learning is a good start.

Optimizing hyperparameters

What you are asking for is "tuning stability". In your question you want to find the local minimum of your optimization problem and you assumed that you've found it (max_depth = 7) with 50 random search tries.

However, it turned out that if you use maxit = 100 (100 random search tries) you get another optimal value. This is perfectly fine.

See it like that: If you are lucky, you can find the local minimum (i.e. the hyperparameter setting which minimizes your error most) on the first try(!). If you are unlucky, you could have 10^6 tries without finding the local minimum.

The problem is, that you do not know what the local minimum is. And you will never find out. No one ever will. Hence, it could happen that the "best setting" is the same for 50 tries and 10^6 tries - or that it differs when using 50 and 51 tries.

Is it always a question about how dense you are able to cover the search space. The search space is n-dimensional (with n being the number of hyperparameters) and the larger n, the more unlikely it is that you will find the optimal setting with the same number of tries.

Finding "the best model"

Is this due to the including undersampling process that randomly reduces one of my classes so the remaining observations change at every run ? If so, it seems that I can never find one single tuned model.

I am not sure what you exactly mean here but these questions might point to the common misunderstanding between cross-validation and "finding the best model". There were a few questions lately which all had these conceptual problem: You do not search for that "best model" in the CV. The CV is only for performance estimation and every fold is unique on its own, with its own optimization, feature selection, etc. You should not search for anything "best" within the folds or try to extract something.

Again, I suggest some literature like the one reference above to better understand the whole picture of what you are doing.

Appendix

You may want to post such questions which are purely focused on "why is something like XX" and "I am not fully getting what is happening in XY, can someone help me?" on https://stats.stackexchange.com rather than on Stackoverflow.
Consider using mlr3 instead of mlr since the latter has been retired by the dev team in July 2019.

Thanks Pat both for your answer and advises. – Basilique Dec 24 '19 at 12:42 — Basilique, Dec 24 '19 at 12:42

R-MLR : tuning hyper parameters using ' makeTuneControlRandom ' for a wrapped learner

1 Answers1

Optimizing hyperparameters

Finding "the best model"

Appendix