I want to create a custom filter that uses the LASSO method (glmnet with alpha=1) to select features - i.e. the features to which glmnet assigns non-zero coefficients are the selected features. The reason I want glmnet as a filter is so that I can then feed those selected features into another learner during resampling.
In order to save re-inventing the wheel, I have used makeLearner in my filter, but I'm not sure if this is a valid thing to do. This is what I have (I am only interested in survival data at this stage):
makeFilter(
name = "LASSO.surv",
desc = "Use the LASSO method to select features for survival data",
pkg = "glmnet",
supported.tasks = c("surv"),
supported.features = c("numerics", "factors", "ordered"),
fun = function(task, nselect, folds, ...) {
data = getTaskData(task, target.extra = TRUE, recode.target = "surv")
lasso.lrn = makeLearner(cl="surv.cvglmnet", id = "lasso", predict.type="response", alpha = 1, nfolds=folds)
model = train(lasso.lrn, task)
mod = model$learner.model
coef.min =coef(mod, s=mod$lambda.min)
res<-as.matrix(coef.min)[,1]
active.min = which(as.matrix(coef.min) != 0)
res[active.min]
})
Then I use the filter like this:
task = wpbc.task
inner = makeResampleDesc("CV", iters=5, stratify=TRUE) # Tuning
cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
cox.filt.lrn = makeFilterWrapper(
makeLearner(cl="surv.coxph", id = "cox.filt", predict.type="response"),
fw.method="LASSO.surv",
fw.perc=0.5,
folds=5
)
learners = list(cox.lrn, cox.filt.lrn)
benchmark(learners, task, inner, measures=list(cindex), show.info=TRUE)
This seems to work (albeit slowly) although I realise I haven't used the argument fw.perc yet. The filtered learner gives a better result than using the cox model on its own:
Task: wpbc-example, Learner: coxph
Resampling: cross-validation
Measures: cindex
[Resample] iter 1: 0.5884477
[Resample] iter 2: 0.6355556
[Resample] iter 3: 0.5333333
[Resample] iter 4: 0.5256410
[Resample] iter 5: 0.7142857
Aggregated Result: cindex.test.mean=0.5994527
Task: wpbc-example, Learner: cox.filt.filtered
Resampling: cross-validation
Measures: cindex
[Resample] iter 1: 0.5379061
[Resample] iter 2: 0.6533333
[Resample] iter 3: 0.7022222
[Resample] iter 4: 0.6452991
[Resample] iter 5: 0.6764706
Aggregated Result: cindex.test.mean=0.6430463
task.id learner.id cindex.test.mean
1 wpbc-example coxph 0.5994527
2 wpbc-example cox.filt.filtered 0.6430463
My question is- is it OK to use makeLearner and then train that learner inside a filter? Is there a better way?