How use predict to new data?

Question

I would like to make predictions using created model by mlr3 package for new data that are previously unknown. I trained model by using AutoTuner function.

I read chapter "3.4.1.4 Predicting" of mlr3 book, but the solution is not useful for my example where I want to use completely new data.

library("mlr3")
library("paradox")
library("mlr3learners")
library("mlr3tuning")
library("data.table")

set.seed(1)

x1 = 1:100
x2 = 2 * x1
y = x1^2 - x2 + rnorm(100)

data = data.table(
   x1 = x1,
   x2 = x2,
   y = y
)

newdata = data.table(x1 = 101:150, x2 = 2 * 101:150)

task = TaskRegr$new("task", backend = data, target = "y")

lrn_xgb = mlr_learners$get("regr.xgboost")

ps = ParamSet$new(
   params = list(
      ParamInt$new(id = "max_depth", lower = 4, upper = 10)
   ))

at = AutoTuner$new(learner = lrn_xgb, 
                   resampling = rsmp("cv", folds = 2),
                   measures = msr("regr.rmse"), 
                   tune_ps = ps,
                   terminator = term("evals", n_evals = 1),
                   tuner = tnr("random_search"))

resampling_outer = rsmp("cv", folds = 2)

rr = resample(task = task, learner = at, resampling = resampling_outer)

at$train(task)

at$predict_newdata(task, newdata)

Session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)

Matrix products: default

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] mlr3learners_0.1.3 mlr3tuning_0.1.0   data.table_1.12.2 
[4] paradox_0.1.0      mlr3_0.1.3

loaded via a namespace (and not attached):
 [1] lgr_0.3.3        lattice_0.20-38  mlr3misc_0.1.4  
 [4] digest_0.6.21    crayon_1.3.4     grid_3.6.1      
 [7] R6_2.4.0         backports_1.1.4  magrittr_1.5    
[10] stringi_1.4.3    uuid_0.1-2       Matrix_1.2-17   
[13] checkmate_1.9.4  xgboost_0.90.0.2 tools_3.6.1     
[16] compiler_3.6.1   Metrics_0.1.4

I can't get your example to run with the latest versions of the packages. Could you update it please? In general, you probably want to use `learner$predict_newdata()`. — Lars Kotthoff, Oct 02 '19 at 21:34
I have current versions of packages from CRAN (I added session info to the main post). The function which you suggest will probably be the solution, but there is one more problem at the moment. Chapters 4.3 and 10.1 of the mlr3book end with the resampling function, but in the AutoTuner documentation example there is one more step "at$train(task)". I understand that it is necessary for the final training of the model? — nukubiho, Oct 03 '19 at 07:31

score 4 · Accepted Answer · answered Oct 03 '19 at 17:38

4

You need to train the selected learner (as you point out in the comments) and then use predict_newdata():

at$train(task)
at$predict_newdata(task, newdata)

answered Oct 03 '19 at 17:38

Lars Kotthoff

107,425
16
204
204

I suspect that there is still one more step missing to pass the parameter from AutoTune to train function. I edited the code in the main question and you will notice that the model parametrs ("eta" and "max_depth") are different in these two functions. – nukubiho Oct 03 '19 at 20:58
Also see that we are still making changes to this member function: https://github.com/mlr-org/mlr3/issues/360#issuecomment-538318337. So you might encounter issues when working with it right now. – pat-s Oct 04 '19 at 10:13
@nukubiho The "AutoTuner" automatically tunes it hyperpars and then trains the model on the task using the best settings from the tuning. What do you mean by "the model parameters are different"? Be careful to not mix up `resample()` and the `$train()` and `predict_newdata()` calls. This answer here is correct. – pat-s Oct 08 '19 at 12:52
@nukubiho Would be great if you could mark the question as "solved" if you agree with my comment :) – pat-s Oct 08 '19 at 12:53

How use predict to new data?

1 Answers1

Linked