"standardize = " option in glmnet package

Question

I have one question regarding the standardize option in a glmnet package.
I understand that scaling or standardizing dataset is necessary for the regression analysis in order to make the coefficients meaningful.
Usually, for just a linear regression (e.g., using a glm functionin R), I manually scale the dataset using a scale() function before I run the glm model.
However, it seems that, when it comes to using a glmnet package (for a regularized regression), a standardize option does standardize the dataset, thereby making the coefficients meaningful (comparable) by itself. Am I correct?

If this is correct, suppose that I run the following code. And it turns out that the variable "x3" has the highest coefficient (in an absolute value scale). Then can I conclude that the variable "x3" is the most important variable in discriminating the categories???

I am looking forward to hearing any opinions!! Thanks.

set.seed(12345) 
example.dat <- data.frame(Category = rbinom(100, 1, 0.5),
                          x1 = rpois(100, 10),
                          x2 = rnorm(100, 3, 10),
                          x3 = rbeta(100, 8, 20),
                          x4 = rnorm(100, -3, 45),
                          x5 = rnorm(100, 1000, 10000))

sample = sample.split(example.dat$Category, SplitRatio = .70)
train = subset(example.dat, sample == TRUE)
test  = subset(example.dat, sample == FALSE)

set.seed(12345)
lasso.fit <- cv.glmnet(data.matrix(train[,-1]),
                       train[,1], 
                       family         = "binomial",
                       nfolds         = nrow(train), # LOOCV
                       grouped        = FALSE,
                       type.measure   = "class",
                       alpha          = 0.6,
                       standardize    = TRUE,
                       standardize.response = TRUE)
print(lasso.fit)
coef       <- as.matrix(abs(coef(lasso.fit, s = "lambda.1se")))
coef.order <- as.matrix(coef[order(coef, decreasing = TRUE),])
rownames(as.matrix(coef.order[coef.order[,1]>0,]))
# [1] "x3"          "(Intercept)"

score 1 · Accepted Answer · answered Sep 03 '21 at 19:25

1

Bit of late response, but hope it helps.

Keep in mind when using the native standarization option that glmnet returns the coefficients on the original scale (see below from the docs), so I would be careful about drawing that conclusion.

Whenever I want to compare coefficients on the same scale, I use scale() to standardize before running glmnet. That way, you can get scaled coefficients returned for your comparisons.

standardize
Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize.

glmnet documentation

answered Sep 03 '21 at 19:25

Colin H

600
4
9

Thank you so much, @Colin H . If I use scale() before running glmnet, then is it right that I need to set standardize = FALSE in order to get standardized coefficients for comparison? – KLee Sep 03 '21 at 22:55
Exactly, there is no need to standardize if you've already done so manually before you input the data into the model. That said, and I haven't tested it personally, but the standardize option shouldn't make much difference if you've already scaled your data. – Colin H Sep 07 '21 at 13:55

"standardize = " option in glmnet package

1 Answers1