2

I want to create a custom filter that uses the LASSO method (glmnet with alpha=1) to select features - i.e. the features to which glmnet assigns non-zero coefficients are the selected features. The reason I want glmnet as a filter is so that I can then feed those selected features into another learner during resampling.

In order to save re-inventing the wheel, I have used makeLearner in my filter, but I'm not sure if this is a valid thing to do. This is what I have (I am only interested in survival data at this stage):

makeFilter(
  name = "LASSO.surv",
  desc = "Use the LASSO method to select features for survival data",
  pkg = "glmnet",
  supported.tasks = c("surv"),
  supported.features = c("numerics", "factors", "ordered"),
  fun = function(task, nselect, folds, ...) {

    data = getTaskData(task, target.extra = TRUE, recode.target = "surv")

    lasso.lrn = makeLearner(cl="surv.cvglmnet", id = "lasso", predict.type="response", alpha = 1, nfolds=folds)
    model = train(lasso.lrn, task)
    mod = model$learner.model

    coef.min =coef(mod, s=mod$lambda.min)
    res<-as.matrix(coef.min)[,1]
    active.min = which(as.matrix(coef.min) != 0)
    res[active.min]
  })

Then I use the filter like this:

task = wpbc.task
inner = makeResampleDesc("CV", iters=5, stratify=TRUE)  # Tuning

cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
cox.filt.lrn =   makeFilterWrapper(
  makeLearner(cl="surv.coxph", id = "cox.filt", predict.type="response"), 
  fw.method="LASSO.surv", 
  fw.perc=0.5,
  folds=5
)
learners = list(cox.lrn, cox.filt.lrn)
benchmark(learners, task, inner, measures=list(cindex), show.info=TRUE)

This seems to work (albeit slowly) although I realise I haven't used the argument fw.perc yet. The filtered learner gives a better result than using the cox model on its own:

Task: wpbc-example, Learner: coxph
Resampling: cross-validation
Measures:             cindex    
[Resample] iter 1:    0.5884477 
[Resample] iter 2:    0.6355556 
[Resample] iter 3:    0.5333333 
[Resample] iter 4:    0.5256410 
[Resample] iter 5:    0.7142857 


Aggregated Result: cindex.test.mean=0.5994527


Task: wpbc-example, Learner: cox.filt.filtered
Resampling: cross-validation
Measures:             cindex    
[Resample] iter 1:    0.5379061 
[Resample] iter 2:    0.6533333 
[Resample] iter 3:    0.7022222 
[Resample] iter 4:    0.6452991 
[Resample] iter 5:    0.6764706 


Aggregated Result: cindex.test.mean=0.6430463


       task.id        learner.id cindex.test.mean
1 wpbc-example             coxph        0.5994527
2 wpbc-example cox.filt.filtered        0.6430463

My question is- is it OK to use makeLearner and then train that learner inside a filter? Is there a better way?

panda
  • 821
  • 1
  • 9
  • 20
  • Have you read https://mlr.mlr-org.com/articles/tutorial/create_filter.html? We would also welcome a PR if you are going this way. Lasso filter is one we already thought about integrating. – pat-s Aug 22 '19 at 06:20
  • Also do not forget to use caching when filtering, this makes this operation almost instant. – pat-s Aug 22 '19 at 06:23
  • @pat-s are you saying this is the right way to go about it? Very happy to contribute to the package if I can, and I plan to write other filters like this, but I'm not sure if I understand the finer details well enough. Is there any documentation on caching? – panda Aug 22 '19 at 06:54
  • The final implementation needs to be more general and using `cvglmnet` instead of being tailored to survival models. And I did not yet look in detail on all parts of your code. But contributions are always welcome. See https://mlr.mlr-org.com/news/index.html#filter---general for caching info. – pat-s Aug 22 '19 at 08:13

0 Answers0