1

Recently I was learning about using mlr3 package. I was doing a feature selection and wanted to compare the model results.However, I wonder how to get the model results from a Benchmarkresult, like what features were uesd in the feature selected learner model.

Here is my code:

#setup task
task = TaskClassif$new(id = "SPI", backend = test, target = "recidivism")

#learner of non-FS
learner_ranger = lrn("classif.ranger",
                     predict_type = "prob",
                     predict_sets = c("train", "test")) %>% print()

# set up for FS
terminator = trm("evals", n_evals = 20)
fselector = fs("random_search")

# create a new learner of FS
at = AutoFSelector$new(
  learner = learner_ranger,
  resampling = rsmp("holdout"),
  measure = msr("classif.auc"),
  terminator = terminator,
  fselector = fselector
)
at


# To compare the optimized feature subset with the complete feature set, we can use benchmark() :
grid = benchmark_grid(task = task,
                      learner = list(at,learner_ranger),
                      resampling = rsmp("holdout"))


# 
set.seed(1)
bmr_Wra = benchmark(grid, store_models = TRUE)

# 
measures = list(
  msr("classif.auc", id = "auc_train", predict_sets = "train"),
  msr("classif.auc", id = "auc_test"),
  msr("classif.acc", id = "acc_test"),
  msr("classif.ce",id = "ce_test")
)

bmr_Wra$aggregate(measures)


# ce
autoplot(bmr_Wra) + theme(axis.text.x = element_text(angle = 45, hjust = 1))

# roc
autoplot(bmr_Wra, type = "roc")

# prc
autoplot(bmr_Wra, type = "prc")`

I noticed that :

Note that it is not feasible to access learned models via this field, as the training task would be ambiguous. For this reason the returned learner are reseted before they are returned. Instead, select a row from the table returned by $score().

So I uesd bmr_Wra$score(), but I still need to further analyse the models and extract information like variable importance. Any help is appreciated. Thanks a lot!

LengJH
  • 23
  • 4
  • Just in case you missed it, there is dedicated [section on benchmarking](https://mlr3book.mlr-org.com/benchmarking.html) in the mlr3book. If something is missing/unclear there, you're welcome to open an issue or discuss specific points in more detail. – pat-s Jun 21 '21 at 12:40
  • Thanks for your help. In the mlr3book , it told me how to extract the specific learner and train set in ResampleResults, but the model part of the learn was in blank. I think I should not retrain the learner with `$train(task)` and mannuly assign the trainset to get the model. Following the book, I uesd AutoFSelector for nested resampling, but the benchmarking section still did not tell me how to get the model results and variable the model used. – LengJH Jun 22 '21 at 03:46

1 Answers1

2

Install the latest mlr3fselect version from gh which offers new helper functions.

extract_inner_fselect_results(bmr_Wra)

Returns the best feature sets of the inner resampling loop. Since you are using hold-out validation, it is just one set.

extract_inner_fselect_archives(bmr_Wra)

Returns all evaluated feature sets of the inner resampling loop. Create the AutoTuner with store_models = TRUE, if you want to access the corresponding models which are stored in the ResampleResults.

You might want to read the book chapter on nested resampling.

be-marc
  • 1,276
  • 5
  • 5
  • Thanks for your help. I have try to install the latest mlr3fselect version(0.5.1) as well as the the development version from GitHub. However, I could not get the functions you mentioned about(extract_inner_fselect_results). As you can see, I have already used `bmr_Wra = benchmark(grid, store_models = TRUE)` to store the results. – LengJH Jun 22 '21 at 03:48
  • I did not used cv for resampling just in case I might have trouble in extracting the benchmark results. In the mlr3book , it told me how to get the specific learner and train set in ResampleResults, but the model part of the learn was in blank. I think I should not retrain the learner with `$train(task)`. – LengJH Jun 22 '21 at 03:48
  • The model slot is not empty if `store_models = TRUE` is set in `AutoFSelector$new()`. This option is available in `benchmark()` and `AutoFSelector$new()`. – be-marc Jun 22 '21 at 09:57
  • You might need to restart the R session after installing the development version. These functions are [available](https://github.com/mlr-org/mlr3fselect/blob/main/R/extract_inner_fselect_results.R). – be-marc Jun 22 '21 at 10:14