I'll add a seed so one can replicate the results below:
library(glmnet)
library(MASS)
data("Boston")
x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
y = Boston$crim
set.seed(100)
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)
The minimum cross-validated MSE is min(cv.lasso$cvm) = 43.51256
. The corresponding lambda is cv.lasso$lambda.min = 0.01843874
. The lambda.1se
is cv.lasso$lambda.1se = 3.375651
. This corresponds to a cross-validated MSE of
cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)] = 57.5393
We can access the cross-validated standard errors directly from GLMNET's output as follows:
cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)] = 15.40236
So the cross-validated MSE one standard error away is
43.51256 + 15.40236 = 58.91492
This is just slightly higher than the cross-validated MSE at lambda.1se
above (i.e. 57.5393
). If we look at the cross-validated MSE at the lambda
just before the lambda.1se
, it is:
cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)-1] = 59.89079
So now that we can reconcile GLMNET's output, let me explain why you are not getting the same thing using your calculation:
cv.lasso$cvm
contains the cross-validated mean MSE for each value of lambda
.
- When we say 1 standard error, we are not talking about the standard error across the lambda's but the standard error across the folds for a given lambda.
- Continuing with the above point, at
lambda.min
, we have 10 folds. We fit 10 models and have 10 out-of-sample MSE's. The mean of these 10 MSE's is given by cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.min)]
. The standard deviation of these 10 MSE's is given by cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)]
. What we are not given in the GLMNET output are the 10 MSE's at lambda.min
. If we had this, then we should be able to replicate the the standard error by using the formula you have above.
Let me know if this helps.
EDIT: Let's do an example whereby we define in advance three folds
set.seed(100)
folds = sample(1:3, nrow(x), replace = T)
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1, keep =T, foldid = folds)
Note that
> min(cv.lasso$cvm)
[1] 42.76584
> cv.lasso$cvsd[which.min(cv.lasso$cvm)]
[1] 17.89725
(These differ from the earlier example above because we've defined our own folds)
Note also that I have an additional parameter keep = T
in the cv.glmnet
call. This returns the fold predictions for each lambda. You can extract them for the optimal lambda by doing:
cv.lasso$fit.preval[,which.min(cv.lasso$cvm)]
Before we proceed, let's create a data frame with the response, the fold predictions, and the corresponding folds:
library(data.table)
OOSPred = data.table(y = y,
predictions = cv.lasso$fit.preval[,which.min(cv.lasso$cvm)],
folds = folds)
Here is a preview of the first 10 rows:
> head(OOSPred, 10)
y predictions folds
1: 0.00632 -0.7477977 1
2: 0.02731 -1.3823830 1
3: 0.02729 -3.4826143 2
4: 0.03237 -4.4419795 1
5: 0.06905 -3.4373021 2
6: 0.02985 -2.5256505 2
7: 0.08829 0.7343478 3
8: 0.14455 1.1262462 2
9: 0.21124 4.0507847 2
10: 0.17004 0.5859587 1
For example, for the cases where folds = 1
, a model was built on folds #2 & #3 and then predictions were obtained for the observations in fold #1. We compute the MSE by fold now:
OOSPredSum = OOSPred[, list(MSE = mean((y - predictions)^2)), by = folds]
folds MSE
1: 1 27.51469
2: 2 75.72847
3: 3 19.93480
Finally, we return the mean MSE and standard error of the MSE across the folds
> OOSPredSum[, list("Mean MSE" = mean(MSE), "Standard Error" = sd(MSE)/sqrt(3))]
Mean MSE Standard Error
1: 41.05932 17.47213
GLMNET may be performing a weighted mean and standard error (weighted by the number of observations in each fold), which is why the figures above a close but do not match exactly.