10

I´m using the glmnet package to perform a LASSO regression. Is there a way to get the importance of the individual variables that were selected? I thought about ranking the coefficients that were obtained through the coef(...) command (i.e. the greater the distance from zero the more important a variable would be). Would that be a valid approach?

Thanks for your help!

cvfit = cv.glmnet(x, y, family = "binomial")
coef(cvfit, s = "lambda.min")

## 21 x 1 sparse Matrix of class "dgCMatrix"
##                    1
## (Intercept)  0.14936
## V1           1.32975
## V2           .      
## V3           0.69096
## V4           .      
## V5          -0.83123
## V6           0.53670
## V7           0.02005
## V8           0.33194
## V9           .      
## V10          .      
## V11          0.16239
## V12          .      
## V13          .      
## V14         -1.07081
## V15          .      
## V16          .      
## V17          .      
## V18          .      
## V19          .      
## V20         -1.04341
user86533
  • 323
  • 1
  • 7
  • 18
  • `glmnet` scales the input variables so in some sense you pick the one variables with the highest "scaled effect". Somehow that makes sense that it should be important, and there are a few papers that actually try to address this particular problem (also the recent book by [Hastie and Tibshirani](http://www.amazon.co.uk/gp/product/1498712169/ref=as_li_tl?ie=UTF8&camp=1634&creative=6738&creativeASIN=1498712169&linkCode=as2&tag=shortcoursein-21") discusses this problem) . However, it really is a question for StackExchange – ekstroem Feb 17 '16 at 19:30

3 Answers3

10

This is how it is done in caret package.

To summarize, you can take the absolute value of the final coefficients and rank them. The ranked coefficients are your variable importance.

To view the source code, you can type

caret::getModelInfo("glmnet")$glmnet$varImp

If you don't want to use caret package, you can run the following lines from the package, and it should work.

varImp <- function(object, lambda = NULL, ...) {

  ## skipping a few lines

  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out, stringsAsFactors = TRUE)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out
}

Finally, call the function with your fit.

varImp(cvfit, lambda = cvfit$lambda.min)
Boxuan
  • 4,937
  • 6
  • 37
  • 73
  • 2
    I think the glmnet package produces unstandardized coefficients. – Josh Feb 14 '20 at 15:36
  • @Boxuan Thank you for the code to calculate `varImp` for `glmnet` model. But the variable importance values are coming more than 1. But when we implement `varImp` from `caret` package, it always ranges 0-1. Can you please respond to that? – UseR10085 Sep 21 '20 at 07:18
  • @BappaDas Could you share a reproducible example? My code is almost identical to the original `caret` code, so I don't see why there could be a discrepancy. – Boxuan Sep 22 '20 at 11:52
  • @Boxuan Please visit this [question](https://stackoverflow.com/questions/63989057/discripencies-in-variable-importance-calculation-for-glmnet-model-in-r) where you will find a reproducible example. – UseR10085 Sep 22 '20 at 12:03
  • When I try running this code, I get the error `no applicable method for varImp applied to an object of class "cv.glmnet"` – Mistakamikaze Oct 25 '20 at 22:49
  • @BappaDas By default, varImp scales all the importances between 1 and 100. You set scale=F for it not to do that. – Alpha Bravo Apr 27 '21 at 18:00
8

Before you compare the magnitudes of the coefficients you should normalize them by multiplying each coefficent by the standard deviation of the corresponding predictor. This answer has more detail and useful links: https://stats.stackexchange.com/a/211396/34615

Community
  • 1
  • 1
Kent Johnson
  • 3,320
  • 1
  • 22
  • 23
2

It's pretty easy to use the contents of the cv.glmnet object to create an ordered list of coefficients...

coefList <- coef(cv.glmnet.MOD, s='lambda.1se')
coefList <- data.frame(coefList@Dimnames[[1]][coefList@i+1],coefList@x)
names(coefList) <- c('var','val')

coefList %>%
  arrange(-abs(val)) %>%
  print(.,n=25)

NOTE: as other posters have commented...to get a like for like comparison you need to scale/z-score your numeric variables prior to modelling step...otherwise a large coefficient value can be assigned to a variable with a very small scale i.e. range(0,1) when placed in a model with variables with very large scales i.e. range(-10000,10000) this will mean that your comparison of coefficient values is not relative and therefore meaningless in most contexts.

Clancy Birrell
  • 111
  • 1
  • 9