I would like to compare simple logistic regressions models where each model considers a specified set of features only. I would like to perform comparisons of these regression models on resamples of the data.
The R package mlr
allows me to select columns at the task level using dropFeatures
. The code would be something like:
full_task = makeClassifTask(id = "full task", data = my_data, target = "target")
reduced_task = dropFeatures(full_task, setdiff( getTaskFeatureNames(full_task), list_feat_keep))
Then I can do benchmark experiments where I have a list of tasks.
lrn = makeLearner("classif.logreg", predict.type = "prob")
rdesc = makeResampleDesc(method = "Bootstrap", iters = 50, stratify = TRUE)
bmr = benchmark(lrn, list(full_task, reduced_task), rdesc, measures = auc, show.info = FALSE)
How can I generate a learner that only considers a specified set of features. As far as I know the filter or selection methods always apply some statistical procedure but do not allow to select the features directly. Thank you!