mlr3: How to filter with mlr on training data set and apply results to model training?

Question

When creating a filter in mlr3 how do you base the filter on only the training data?

Once the filter is created how do you apply the filter to the modeling process and subset the training data to only include filter values above a certain threshold?

library(mlr3)
library(mlr3filters)
library(mlr3learners)
library(tidyverse)


data(iris)
iris <- iris %>%
  select(-Species)
  
tsk <- mlr3::TaskRegr$new("iris", 
                          backend = iris, 
                          target = "Sepal.Length")

#split train and test
trn_ids <- sample(tsk$row_ids, floor(0.8 * length(tsk$row_ids)), F)
tst_ids <- setdiff(tsk$row_ids, trn_ids)

#create a filter
filter = flt("correlation", method = "spearman")

# Question 1: how to calculate the filter only for the train IDs?
filter$calculate(tsk)
print(filter)

# Question 2: how to only use only variables with X correlation or greater in training?
learner <- mlr_learners$get("regr.glmnet")
learner$train(tsk, row_ids = trn_ids)
prediction <- learner$predict(tsk, row_ids = tst_ids)
prediction$response

score 1 · Answer 1 · answered Aug 21 '20 at 17:51

Filters can be wrapped into a Learner using mlr3pipelines.

The mlr3gallery has an example here (section "Feature Filtering").

The basic recipe is to create a graph like so:

fpipe = po("filter", flt("mim"), filter.nfeat = 3) $>>$ lrn("regr.glmnet")

and wrap this in a GraphLearner:

lrnr = GraphLearner$new(fpipe).

lrnr can now be used like any other learner and internally filters features according to the specified filter before training the learner.

mlr3: How to filter with mlr on training data set and apply results to model training?

1 Answers1