After studying the sources describing mlr3
and looking at the given examples I still couldn't find any answer about how to impute the target variable during a regression task, when it has missings. I want to use Ranger, but it can't deal with missings in the target variable.
Error: Task 'Airtemp' has missing values in column(s) 'T.means.hr', but learner 'regr.ranger' does not support this
This happened PipeOp regr.ranger's $train()
task_Airtemp$missings()
Output:
T.means.hr H.means.hr Rad.means.hr timestamp
266 213 739 0
Thanks to the tutorials and the mlr3book I was quickly able to include missing indicators and imputation in my workflow as a pipeOp but only for the features.
pom = po("missind") # Add missing indicator columns ("dummy columns") to the Task
pon = po("imputehist", id = "imputer_num") # Imputes numerical features by histogram
For example you can see, how the target variable is unaffected by the pipeOp pom
:
task_ext$data()
T.means.hr missing_H.means.hr missing_Rad.means.hr missing_timestamp
1: 23.61 present present present
My first idea was just to define a task without declaring it as a regression task (as_task()
instead of as_task_regr()
) and defining the target variable at the end of the workflow for the learner, but that didn't work out:
Error in UseMethod("as_task") :
no applicable method for 'as_task' applied to an object of class "data.frame"
The idea of changing the role of the target to a feature with:
task_Airtemp$col_roles$feature = "T.means.hr"
and setting it back to target after the pipeOps pom
and pon
are done
didn't prove successful either.
For the Resampling step I want to use RollingWindowCV
from the mlr3temporal package. That's why it is imporant to me, that I have a time series without missings.
rr = resample(task_Airtemp, graph_learner, rsmp("RollingWindowCV", folds = 10, fixed_window = T, window_size = window.size, horizon = predict.horizon))
Sorry, if I have overlooked something and thanks for the amazing package. :)