4

I wrote this lasso code in R, and I got some beta values:

#Lasso
library(MASS)
library(glmnet)
Boston=na.omit(Boston)
x=model.matrix(crim~.,Boston)[,-1]
rownames(x)=c()
y=as.matrix(Boston$crim)
lasso.mod =glmnet(x,y, alpha =1, lambda = 0.1)
beta=as.matrix(rep(0,19))
beta=coef(lasso.mod)
matplot(beta,type="l",lty=1,xlab="L1",ylab="beta",main="lasso")

I want to plot the betas in a plot like this:

enter image description here

but I don't know what plot function in R I can use to do that.

M--
  • 25,431
  • 8
  • 61
  • 93
Majk
  • 87
  • 1
  • 9
  • 1
    I think `plot(lasso.mod)` will do that. Please provide the dataset to make the example reproducible (probably here loaded from an r package ?) – Gilles San Martin Feb 25 '18 at 22:12
  • @Gilles I added still the *library(MASS)* for the dataset. Here is the [link](https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Boston.csv) to download it. I'm actually interested to know how can I do the plot from the beta values without using the `lasso.mod` – Majk Feb 25 '18 at 22:23
  • 1
    If you want to see the values of the coefficients for different levels of regularization you should also not fix lambda : `lasso.mod =glmnet(x,y, alpha =1)`, then `coef(lasso.mod)` and/or `plot(lasso.mod)` – Gilles San Martin Feb 25 '18 at 22:40
  • @Gilles thank you, but is there any way to plot the betas like plot(beta) or something, without doing plot(lasso.mod)? – Majk Feb 25 '18 at 22:52
  • Typical Cross Validated [/stats](https://stats.stackexchange.com/) question. Flagged for moving there. – ZF007 Feb 25 '18 at 23:24

1 Answers1

13

I don't understand why you don't want to use the build-in glmnet method but you can certainly reproduce its results (here with ggplot).
You still need the model object to extract the lambda values...

Edit: added Coefs vs L1 norm

Reproduce your minimal example

library(MASS)
library(glmnet)
#> Le chargement a nécessité le package : Matrix
#> Le chargement a nécessité le package : foreach
#> Loaded glmnet 2.0-13
library(ggplot2)
library(reshape)
#> 
#> Attachement du package : 'reshape'
#> The following object is masked from 'package:Matrix':
#> 
#>     expand

Boston=na.omit(Boston)
x=model.matrix(crim~.,Boston)[,-1]
y=as.matrix(Boston$crim)
lasso.mod =glmnet(x,y, alpha =1)
beta=coef(lasso.mod)

Extract the coef values and transform them in a long, tidy form suitable for ggplot

tmp <- as.data.frame(as.matrix(beta))
tmp$coef <- row.names(tmp)
tmp <- reshape::melt(tmp, id = "coef")
tmp$variable <- as.numeric(gsub("s", "", tmp$variable))
tmp$lambda <- lasso.mod$lambda[tmp$variable+1] # extract the lambda values
tmp$norm <- apply(abs(beta[-1,]), 2, sum)[tmp$variable+1] # compute L1 norm

Plot with ggplot : coef vs lambda

# x11(width = 13/2.54, height = 9/2.54)
ggplot(tmp[tmp$coef != "(Intercept)",], aes(lambda, value, color = coef, linetype = coef)) + 
    geom_line() + 
    scale_x_log10() + 
    xlab("Lambda (log scale)") + 
    guides(color = guide_legend(title = ""), 
           linetype = guide_legend(title = "")) +
    theme_bw() + 
    theme(legend.key.width = unit(3,"lines"))

The same with the base plot glmnet method :

# x11(width = 9/2.54, height = 8/2.54)
par(mfrow = c(1,1), mar = c(3.5,3.5,2,1), mgp = c(2, 0.6, 0), cex = 0.8, las = 1)
plot(lasso.mod, "lambda", label = TRUE)

Plot with ggplot : coef vs L1 norm

# x11(width = 13/2.54, height = 9/2.54)
ggplot(tmp[tmp$coef != "(Intercept)",], aes(norm, value, color = coef, linetype = coef)) + 
    geom_line() + 
    xlab("L1 norm") + 
    guides(color = guide_legend(title = ""), 
           linetype = guide_legend(title = "")) +
    theme_bw() + 
    theme(legend.key.width = unit(3,"lines"))

The same with the base plot glmnet method :

# x11(width = 9/2.54, height = 8/2.54)
par(mfrow = c(1,1), mar = c(3.5,3.5,2,1), mgp = c(2, 0.6, 0), cex = 0.8, las = 1)
plot(lasso.mod, "norm", label = TRUE)

Created on 2018-02-26 by the reprex package (v0.2.0).

Gilles San Martin
  • 4,224
  • 1
  • 18
  • 31
  • 1
    thank you, the reason why I didn't want to use the build-in method is because I have a lasso code from scratch which is very long, and I get the beta values which I have to plot without the build-in function. So, i asked this question as an example. – Majk Feb 26 '18 at 00:01
  • Ok I understand now ! I have added the other (default) representation of the glmnet plot method – Gilles San Martin Feb 26 '18 at 00:12
  • @Gilles Can you also add the number of the variables at the top of the plots, as in the case of glmnet plot method, if possible? – mert Jan 08 '20 at 12:08
  • If you want to perform the data cleaning/reshaping in the tidyverse, you can use the following: `tmp <- as_tibble(as.matrix(coef(lasso.mod)), rownames = "coef") %>% pivot_longer(cols = -coef, names_to = "variable", names_transform = list(variable = parse_number), values_to = "value") %>% group_by(variable) %>% mutate(lambda = lasso.mod$lambda[variable + 1], norm = sum(if_else(coef == "(Intercept)", 0, abs(value))))` – Jake Fisher Aug 19 '21 at 00:05