0

I am trying to do feature selection using the glmnet package. I have been about to run the glmnet. However, I have a tough time understanding the output. My goal is to get the list of genes and their respective coefficients so I can rank the list of gene based on how relevant they are at separating my two group of labels.

x = manual_normalized_melt[,colnames(manual_normalized_melt) %in% 
sig_0_01_ROTS$Gene]
y = cellID_reference$conditions

glmnet_l0 <- glmnet(x = as.matrix(x), y = y, family = "binomial",alpha = 1)

Any hints/instructions on how I go from here? I know that the data I want is within the glmnet_l0 but I am a bit unsure on how to extract it.

Additionally, anyone know if there is a way to use L0-norm for feature selection in R?

Thank you so much!

Son nguyen
  • 29
  • 4

2 Answers2

1

Here are some approaches in glmnet:

first some data because you did not post any (iris data with two levels in species):

data(iris)
x <- iris[,1:4]
y <- iris[,5]
y[y == "setosa"] <- "virginica"
y <- factor(y)

First run a cross validation model to see what is the best lambda:

library(glmnet)
model_cv <- cv.glmnet(x = as.matrix(x),
                      y = y,
                      family = "binomial",
                      alpha = 1,
                      nfolds = 5,
                      intercept = FALSE)

Here I chose to have 5-fold cross validation to determine the best lambda.

Too see the coefficients at best lambda:

coef(model_cv, s = "lambda.min")
#output
#5 x 1 sparse Matrix of class "dgCMatrix"
                      1
(Intercept)   .        
Sepal.Length -0.7966676
Sepal.Width   1.9291364
Petal.Length -0.9502821
Petal.Width   2.7113327

Here you can see no variables were dropped (or they would have . instead of a coefficient). If all the features are on the same scale (like gene expression data) you might consider adding standardize = FALSE as an argument to the glmnet call since it is by default set to TRUE. At least I would when modeling expression.

To see the best lambda:

model_cv$lambda[which.min(model_cv$cvm)]

Now you can make a model with all the data:

glmnet_l0 <- glmnet(x = as.matrix(x),
                    y = y,
                    family = "binomial",
                    alpha = 1,
                    intercept = FALSE)

You can plot it on the lambda scale and add a vertical line depicting best lambda:

plot(glmnet_l0, xvar = "lambda")
abline(v = log(model_cv$lambda[which.min(model_cv$cvm)]))

enter image description here

Here one can see coefficients were hardly shrunk at all at best lambda.

with higher dimensional data you will see many coefficient traces go towards 0 before best lambda kicks in and many . in the coef matrix.

When using predict.glmnet set s = model_cv$lambda[which.min(model_cv$cvm)] or it will generate predictions for all tested lambda.

Also check this post it contains some other relevant information.

missuse
  • 19,056
  • 3
  • 25
  • 47
0

A while back I wrapped glmnet in a package for feature selection, you can either look at the code (beginning from line 89) or you can download the package using devtools::install_github('mlampros/FeatureSelection'). I explained also how it works in a blog post.

lampros
  • 581
  • 5
  • 12