I want to impute missing values by bag and KNN. how do I do that with MLR3 correctly?
Looking at some examples, it seems possible using mlr3pipelines but not 100% sure po("imputelearner", lrn(bag or knn learners)). When I tried it, I have a problem (not compatible). Does it matter if I used classif or regr model for imputation when my task is different. For example, I have classif task but I use regr model to impute or surv task and I use either classif or regr to impute.
I also find another package called NADIA that is compatible with mlr3pipelines (PipeOpVIM_kNN, PipeOpmissForest, and PipeOpMice) it would be nice to see some working example as none worked for me.
Four uncommented learners in the code have warnings when I run them. Initially, I thought it is not big of deal but tuning these learners always fails.
Task
na <- sample(1:1151, 1151*0.1)
data = tsk("actg")$data()
data$age[na] <- NA
data$tx[na] <- NA
task = TaskSurv$new("actg_na", backend = data, time = "time", event = "status")
PipOps
preproc = po("removeconstants", ratio = 0.05) # remove zv and nzv features
sim_impute = po("imputemedian", affect_columns = selector_type("numeric")) %>>%
po("imputemode", affect_columns = selector_type("factor"))
bag_impute = NADIA::PipeOpmissForest$new()
knn_impute = NADIA::PipeOpVIM_kNN$new()
mic_impute = NADIA::PipeOpMice$new()
miA_impute = NADIA::PipeOpMice_A$new()
ran_impute = po("imputesample")
Learners
simlearner = as_learner(preproc %>>% sim_impute %>>% po("encode") %>>%
po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))
baglearner = as_learner(preproc %>>% bag_impute %>>% po("encode") %>>%
po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))
knnlearner = as_learner(preproc %>>% knn_impute %>>% po("encode") %>>%
po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))
miclearner = as_learner(preproc %>>% mic_impute %>>% po("encode") %>>%
po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))
miAlearner = as_learner(preproc %>>% miA_impute %>>% po("encode") %>>%
po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))
ranlearner = as_learner(preproc %>>% ran_impute %>>% po("encode") %>>%
po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))
Train Learners
simlearner$train(task)
baglearner$train(task)
# knnlearner$train(task)
miclearner$train(task)
miAlearner$train(task)
# ranlearner$train(task)