Why are tuned minsplit & minbucket from rpart decimal numbers?

Question

I estimate a model using a classif.rpart learner. The estimation is embedded in a nested resampling. When I look at the inner tuning results using mlr3tuning::extract_inner_tuning_results(bmr), the values for minbucket and minsplit are decimal numbers (example: minbucket 0.13 or 2.81, minsplit 2.35 or 4.61). From my understanding, both indicate numbers of observations, so I thought it should be integers. Do you have an explanation for why these numbers are decimal? Thank you in advance!

Edit: I cannot post the original code I use, but this code shows the same behaviour, using a task from the mlr3 package.

library(mlr3)
library(progressr)

# choose task
sonar <- tsk("sonar")

# choose learners
l_rpart <- lrn("classif.rpart")
l_ranger <- lrn("classif.ranger")

# add search spaces to learners
l_rpart$param_set$values <- lts("classif.rpart.default")$values
l_ranger$param_set$values <- lts("classif.ranger.default")$values

# add fallback learners
l_rpart$fallback = lrn("classif.featureless")
l_ranger$fallback = lrn("classif.featureless")

# robustify
rpart_graph <- mlr3pipelines::pipeline_robustify(task = sonar, learner = l_rpart) %>>% mlr3pipelines::po("learner", l_rpart)
rpart_learner <- mlr3::as_learner(rpart_graph)

ranger_graph <- mlr3pipelines::pipeline_robustify(task = sonar, learner = l_ranger) %>>% mlr3pipelines::po("learner", l_ranger)
ranger_learner <- mlr3::as_learner(ranger_graph)

# create autotuners
at_rpart <- mlr3tuning::auto_tuner(
  method = mlr3verse::tnr("random_search"),
  learner = rpart_learner,
  resampling = mlr3::rsmp("cv", folds = 4),
  measure = mlr3::msr("classif.acc", id = "acc"),
  term_time =  1 * 60,
  term_evals = 4)

at_ranger <- mlr3tuning::auto_tuner(
  method = mlr3verse::tnr("random_search"),
  learner = ranger_learner,
  resampling = mlr3::rsmp("cv", folds = 4),
  measure = mlr3::msr("classif.acc", id = "acc"),
  term_time =  1 * 60,
  term_evals = 4)

# create the benchmark design
design = benchmark_grid(tasks = sonar,
                        learners = list(at_rpart, at_ranger),
                        resamplings = mlr3::rsmp("cv", folds = 3))

# run the benchmark experiment
bmr = with_progress(benchmark(design, 
                              store_models = TRUE))

# show inner tuning results
mlr3tuning::extract_inner_tuning_results(bmr)

The beginning of the output looks like this, where you can see that classif.rpart.minsplit and classif.rpart.minbucket are decimals instead of integers as I would expect.:

mlr3tuning::extract_inner_tuning_results(bmr)
   experiment iteration classif.rpart.minsplit classif.rpart.minbucket classif.rpart.cp classif.ranger.mtry.ratio classif.ranger.replace
1:          1         1               2.834898               2.9295168        -9.089721                        NA                     NA
2:          1         2               4.515618               0.5116199        -3.805193                        NA                     NA
3:          1         3               3.484092               2.6164599        -3.131506                        NA                     NA
4:          2         1                     NA                      NA               NA                 0.2700584                  FALSE
5:          2         2                     NA                      NA               NA                 0.1032228                   TRUE
6:          2         3                     NA                      NA               NA                 0.3427129                  FALSE

Thank you again for looking into it.

Can you share the code you're using to do the tuning please? — Lars Kotthoff, Aug 03 '23 at 12:48
@LarsKotthoff, I added some example code that shows the same behaviour as my code. Hope that helps to illustrate what I am wondering about. Let me know, if I can add to this in any way. — Theresa, Aug 04 '23 at 12:42

score 0 · Answer 1 · answered Aug 04 '23 at 16:36

0

This is a problem in the mlr3tuningspaces package, which defines these HPs as numeric when they should be integer. I've opened an issue here: https://github.com/mlr-org/mlr3tuningspaces/issues/42

answered Aug 04 '23 at 16:36

Lars Kotthoff

107,425
16
204
204

Thank you for finding the explanation and opening the issue! – Theresa Aug 07 '23 at 06:54

score 0 · Accepted Answer · answered Aug 07 '23 at 11:09

minsplit and minbucket are tuned on the logarithmic scale in the default tuning space. The values you see in the archive are before the transformation e.g. 2.834898 becomes exp(2.834898) = 17.02866. Since minsplit is defined as an integer, the value is then rounded to 17 before the model is trained. See the tuning chapter in the mlr3book for more information.

Why are tuned minsplit & minbucket from rpart decimal numbers?

2 Answers2