I have one question regarding the standardize option in a glmnet package.
I understand that scaling or standardizing dataset is necessary for the regression analysis in order to make the coefficients meaningful.
Usually, for just a linear regression (e.g., using a glm functionin R), I manually scale the dataset using a scale() function before I run the glm model.
However, it seems that, when it comes to using a glmnet package (for a regularized regression), a standardize option does standardize the dataset, thereby making the coefficients meaningful (comparable) by itself. Am I correct?
If this is correct, suppose that I run the following code. And it turns out that the variable "x3" has the highest coefficient (in an absolute value scale). Then can I conclude that the variable "x3" is the most important variable in discriminating the categories???
I am looking forward to hearing any opinions!! Thanks.
set.seed(12345)
example.dat <- data.frame(Category = rbinom(100, 1, 0.5),
x1 = rpois(100, 10),
x2 = rnorm(100, 3, 10),
x3 = rbeta(100, 8, 20),
x4 = rnorm(100, -3, 45),
x5 = rnorm(100, 1000, 10000))
sample = sample.split(example.dat$Category, SplitRatio = .70)
train = subset(example.dat, sample == TRUE)
test = subset(example.dat, sample == FALSE)
set.seed(12345)
lasso.fit <- cv.glmnet(data.matrix(train[,-1]),
train[,1],
family = "binomial",
nfolds = nrow(train), # LOOCV
grouped = FALSE,
type.measure = "class",
alpha = 0.6,
standardize = TRUE,
standardize.response = TRUE)
print(lasso.fit)
coef <- as.matrix(abs(coef(lasso.fit, s = "lambda.1se")))
coef.order <- as.matrix(coef[order(coef, decreasing = TRUE),])
rownames(as.matrix(coef.order[coef.order[,1]>0,]))
# [1] "x3" "(Intercept)"