Benchmarking multiple AutoTuning instances

Question

I have been trying to use mlr3 to do some hyperparameter tuning for xgboost. I want to compare three different models:

xgboost tuned over just the alpha hyperparameter
xgboost tuned over alpha and lambda hyperparameters
xgboost tuned over alpha, lambda, and maxdepth hyperparameters.

After reading the mlr3 book, I thought that using AutoTuner for the nested resampling and benchmarking would be the best way to go about doing this. Here is what I have tried:

task_mpcr <- TaskRegr$new(id = "mpcr", backend = data.numeric, target = "n_reads")

measure <- msr("poisson_loss")

xgb_learn <- lrn("regr.xgboost")

set.seed(103)
fivefold.cv = rsmp("cv", folds = 5)

param.list <- list(  alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
                 lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
                 max_depth = p_int(lower = 2, upper = 10)
)


model.list <- list()
for(model.i in 1:length(param.list)){

  param.list.subset <- param.list[1:model.i]
  search_space <- do.call(ps, param.list.subset)

  model.list[[model.i]] <- AutoTuner$new(
    learner = xgb_learn,
    resampling = fivefold.cv,
    measure = measure,
    search_space = search_space,
    terminator = trm("none"),
    tuner = tnr("grid_search", resolution = 10),
    store_tuning_instance = TRUE
  )
}
grid <- benchmark_grid(
task = task_mpcr,
learner = model.list,
resampling = rsmp("cv", folds =3)
)

bmr <- benchmark(grid, store_models = TRUE)

Note that I added Poisson loss as a measure for the count data I am working with. For some reason after running the benchmark function, the Poisson loss of all my models is nearly identical per fold, making me think that no tuning was done.

I also cannot find a way to access the hyperparameters used to get the lowest loss per train/test iteration. Am I misusing the benchmark function entirely? Also, this is my first question on SO, so any formatting advice would be appreciated!

score 2 · Accepted Answer · answered Mar 24 '21 at 09:04

To see whether tuning has an effect, you can just add an untuned learner to the benchmark. Otherwise, the conclusion could be that tuning alpha is sufficient for your example.

I adapted the code so that it runs with an example task.

library(mlr3verse)

task <- tsk("mtcars")

measure <- msr("regr.rmse")

xgb_learn <- lrn("regr.xgboost")

param.list <- list(
  alpha = p_dbl(lower = 0.001, upper = 100, logscale = TRUE),
  lambda = p_dbl(lower = 0.001, upper = 100, logscale = TRUE)
)

model.list <- list()
for(model.i in 1:length(param.list)){
  
  param.list.subset <- param.list[1:model.i]
  search_space <- do.call(ps, param.list.subset)
  
  at <- AutoTuner$new(
    learner = xgb_learn,
    resampling = rsmp("cv", folds = 5),
    measure = measure,
    search_space = search_space,
    terminator = trm("none"),
    tuner = tnr("grid_search", resolution = 5),
    store_tuning_instance = TRUE
  )
  at$id = paste0(at$id, model.i)
  
  model.list[[model.i]] <- at
}

model.list <- c(model.list, list(xgb_learn)) # add baseline learner

grid <- benchmark_grid(
  task = task,
  learner = model.list,
  resampling = rsmp("cv", folds =3)
)

bmr <- benchmark(grid, store_models = TRUE)

autoplot(bmr)

bmr_data = bmr$data$as_data_table() # convert benchmark result to a handy data.table
bmr_data$learner[[1]]$learner$param_set$values # the final learner used by AutoTune is nested in $learner

# best found value during grid search
bmr_data$learner[[1]]$archive$best()

# transformed value (the one that is used for the learner)
bmr_data$learner[[1]]$archive$best()$x_domain

In the last lines you see how one can access the individual runs of the benchmark. Im my example we have 9 runs resulting for 3 learners and 3 outer resampling folds.

Benchmarking multiple AutoTuning instances

1 Answers1