1

I want to impute missing values by bag and KNN. how do I do that with MLR3 correctly?

Looking at some examples, it seems possible using mlr3pipelines but not 100% sure po("imputelearner", lrn(bag or knn learners)). When I tried it, I have a problem (not compatible). Does it matter if I used classif or regr model for imputation when my task is different. For example, I have classif task but I use regr model to impute or surv task and I use either classif or regr to impute.

I also find another package called NADIA that is compatible with mlr3pipelines (PipeOpVIM_kNN, PipeOpmissForest, and PipeOpMice) it would be nice to see some working example as none worked for me.

Four uncommented learners in the code have warnings when I run them. Initially, I thought it is not big of deal but tuning these learners always fails.

Task

na <- sample(1:1151, 1151*0.1)
data = tsk("actg")$data()

data$age[na]    <- NA
data$tx[na]     <- NA

task = TaskSurv$new("actg_na", backend = data, time = "time", event = "status")

PipOps

preproc     = po("removeconstants", ratio =  0.05) # remove zv and nzv features 

sim_impute  = po("imputemedian", affect_columns = selector_type("numeric")) %>>%
              po("imputemode",   affect_columns = selector_type("factor"))

bag_impute  = NADIA::PipeOpmissForest$new()

knn_impute  = NADIA::PipeOpVIM_kNN$new() 

mic_impute  = NADIA::PipeOpMice$new()

miA_impute  = NADIA::PipeOpMice_A$new()

ran_impute  = po("imputesample")

Learners

simlearner  = as_learner(preproc %>>% sim_impute %>>% po("encode") %>>% 
                         po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))

baglearner  = as_learner(preproc %>>% bag_impute %>>% po("encode") %>>% 
                         po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))

knnlearner  = as_learner(preproc %>>% knn_impute %>>% po("encode") %>>% 
                         po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))

miclearner  = as_learner(preproc %>>% mic_impute %>>% po("encode") %>>% 
                         po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))

miAlearner  = as_learner(preproc %>>% miA_impute %>>% po("encode") %>>% 
                         po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))

ranlearner  = as_learner(preproc %>>% ran_impute %>>% po("encode") %>>% 
                         po("learner", lrn("surv.glmnet", predict_sets = c("train", "test"))))

Train Learners

simlearner$train(task)

baglearner$train(task)

# knnlearner$train(task)

miclearner$train(task)

miAlearner$train(task)

# ranlearner$train(task)
Ali Alhadab
  • 101
  • 5
  • Can you please share the code that didn't work for you? – Lars Kotthoff Nov 30 '21 at 19:17
  • bag_impute = po("imputelearner", lrn("regr.rpart")# can use knn learn with missing properties knn_impute = NADIA::PipeOpVIM_kNN$new() MICE_impute = NADIA::PipeOpMice$new() MICE_A_impute = NADIA::PipeOpMice_A$new() – Ali Alhadab Dec 01 '21 at 16:11
  • That works fine for me. Can you please provide a complete example that allows to reproduce the problem you were running into? – Lars Kotthoff Dec 01 '21 at 17:15

0 Answers0