mlr - Access data between or after preprocessing steps

Question

Is there a way to access the data after performing a preprocessing step using a wrapper in mlr? Here a stripped version of the code:

library(mlr)
library(mlbench)

data <- BreastCancer[, 2:11]
lrn <- makeLearner(cl = "classif.ranger",
                        predict.type = "prob",
                        fix.factors.prediction = TRUE,
                        importance = "permutation")

lrn <- makeImputeWrapper(lrn, classes = list(integer = imputeMedian(),
                                                  numeric = imputeHist(),
                                                  factor = imputeMode()))

lrn <- makeRemoveConstantFeaturesWrapper(lrn, na.ignore = TRUE)

classif.task <- makeClassifTask(data = rawdata, target = "Target", positive = "1")

model <- train(lrn, classif.task)

The code defines a learner, removes constant features and performs imputation. Is there a way to see how the data will look like after the removal of the constant features or, more interestingly, after the imputation?

score 1 · Accepted Answer · answered Oct 12 '17 at 15:15

1

This isn't implemented at the moment -- the point of the wrappers is to encapsulate everything so that you don't have to worry about the intermediate steps.

You can however use the impute() function to do the same imputation separately (and similarly for the removal of constant features). See the tutorial for more information.

answered Oct 12 '17 at 15:15

Lars Kotthoff

107,425
16
204
204

Thanks for the quick response. The reason why I'm using the wrappers, e.g. custom ones, not mentioned above, is to consolidate the training and scoring code in one function while passing arguments between the two (as well as perform hyperparameter tuning if necessary). However, testing/debugging the code within the "real" workflow is often as useful as using unit tests. **And** there are cases where a third package, e.g. in my case the xgboostExplainer (https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211), requires the preprocessed training data. – notiv Oct 12 '17 at 19:33
1

If you wrote custom PreprocessingWrappers (http://mlr-org.github.io/mlr-tutorial/devel/html/preproc/index.html#preprocessing-wrapper-functions) you can simply store stuff in the global enviroment (<<-) or write things to disk in the train and predict functions. – jakob-r Oct 13 '17 at 08:13

mlr - Access data between or after preprocessing steps

1 Answers1