lambda.1se not being in one standard error of the error

Question

In the documentation of the function cv.glmnet(), it is given that:

lambda.1se :
largest value of lambda such that error is within 1 standard error of the minimum.

Which means that lambda.1se gives the lambda, which gives an error (cvm) which is one standard error away from the minimum error.

So, while trying to check this fact:
There is a data set Boston in the library MASS. I performed a cross validation, using the lasso:

x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
y = Boston$crim
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)

And the value of cv.lasso$lambda.min comes out to be:

> cv.lasso$lambda.min
[1] 0.05630926

And, the value of cv.lasso$lambda.1se is:

> cv.lasso$lambda.1se
[1] 3.375651

Now, look at this:

> std(cv.lasso$cvm)
[1] 0.7177808

Where std is a function, that returns the standard error of the values inserted in it.1
And the minimum value of cvm can be found as:

> cv.lasso$cvm[cv.lasso$lambda==cv.lasso$lambda.min]
[1] 42.95009

So, we add the standard error to the value of cvm and we get:

> 42.95009+0.7177808
[1] 43.66787

Even though there is no lambda value corresponding to this cvm value, we can have an idea on the basis of existing data:

Which means lambda.1se should be between 0.4784899 and 0.4359821. But that's absolutely not the case. So, there's a gut feeling that says I'm making a blunder here. Can you help me point at that?

1:Definition of std:

std<-function(x)
  sd(x)/sqrt(length(x))

jav · Accepted Answer · 2017-08-27T17:13:17.920

I'll add a seed so one can replicate the results below:

library(glmnet)
library(MASS)
data("Boston")
x = model.matrix(crim~.-1,data=Boston)#-1 for removing the intercept column
y = Boston$crim
set.seed(100)
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1)

The minimum cross-validated MSE is min(cv.lasso$cvm) = 43.51256. The corresponding lambda is cv.lasso$lambda.min = 0.01843874. The lambda.1se is cv.lasso$lambda.1se = 3.375651. This corresponds to a cross-validated MSE of

cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)] = 57.5393

We can access the cross-validated standard errors directly from GLMNET's output as follows:

cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)] = 15.40236

So the cross-validated MSE one standard error away is

43.51256 + 15.40236 = 58.91492

This is just slightly higher than the cross-validated MSE at lambda.1se above (i.e. 57.5393). If we look at the cross-validated MSE at the lambda just before the lambda.1se, it is:

cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.1se)-1] = 59.89079

So now that we can reconcile GLMNET's output, let me explain why you are not getting the same thing using your calculation:

cv.lasso$cvm contains the cross-validated mean MSE for each value of lambda.
When we say 1 standard error, we are not talking about the standard error across the lambda's but the standard error across the folds for a given lambda.
Continuing with the above point, at lambda.min, we have 10 folds. We fit 10 models and have 10 out-of-sample MSE's. The mean of these 10 MSE's is given by cv.lasso$cvm[which(cv.lasso$lambda == cv.lasso$lambda.min)]. The standard deviation of these 10 MSE's is given by cv.lasso$cvsd[which(cv.lasso$lambda == cv.lasso$lambda.min)]. What we are not given in the GLMNET output are the 10 MSE's at lambda.min. If we had this, then we should be able to replicate the the standard error by using the formula you have above.

Let me know if this helps.

EDIT: Let's do an example whereby we define in advance three folds

set.seed(100)
folds = sample(1:3, nrow(x), replace = T)
cv.lasso = cv.glmnet(x,y,type.measure = "mse",alpha=1, keep =T, foldid = folds)

Note that

> min(cv.lasso$cvm)
[1] 42.76584
> cv.lasso$cvsd[which.min(cv.lasso$cvm)]
[1] 17.89725

(These differ from the earlier example above because we've defined our own folds)

Note also that I have an additional parameter keep = T in the cv.glmnet call. This returns the fold predictions for each lambda. You can extract them for the optimal lambda by doing:

cv.lasso$fit.preval[,which.min(cv.lasso$cvm)]

Before we proceed, let's create a data frame with the response, the fold predictions, and the corresponding folds:

library(data.table)
OOSPred = data.table(y = y, 
                     predictions = cv.lasso$fit.preval[,which.min(cv.lasso$cvm)], 
                     folds = folds)

Here is a preview of the first 10 rows:

> head(OOSPred, 10)
          y predictions folds
 1: 0.00632  -0.7477977     1
 2: 0.02731  -1.3823830     1
 3: 0.02729  -3.4826143     2
 4: 0.03237  -4.4419795     1
 5: 0.06905  -3.4373021     2
 6: 0.02985  -2.5256505     2
 7: 0.08829   0.7343478     3
 8: 0.14455   1.1262462     2
 9: 0.21124   4.0507847     2
10: 0.17004   0.5859587     1

For example, for the cases where folds = 1, a model was built on folds #2 & #3 and then predictions were obtained for the observations in fold #1. We compute the MSE by fold now:

OOSPredSum = OOSPred[, list(MSE = mean((y - predictions)^2)), by = folds]

   folds      MSE
1:     1 27.51469
2:     2 75.72847
3:     3 19.93480

Finally, we return the mean MSE and standard error of the MSE across the folds

> OOSPredSum[, list("Mean MSE" = mean(MSE), "Standard Error" = sd(MSE)/sqrt(3))]
   Mean MSE Standard Error
1: 41.05932       17.47213

GLMNET may be performing a weighted mean and standard error (weighted by the number of observations in each fold), which is why the figures above a close but do not match exactly.

I think you mean "This is just slightly higher than the cross-validated MSE at **`lambda.1se`**". — Mooncrater, Aug 26 '17 at 17:28
Correct me if I'm wrong. So, your third point says that for each `lambda` value, we fit 10(exactly?) models which are produced by randomly selecting the 10 folds of cross-validation. Now, we fit these models, and find the MSE for each of these models. The `cvm` value is eventually, the mean of these MSEs. Thus we get a corresponding `cvm` for each `lambda` which is dependent upon the folds that cross-validation selects. Right? — Mooncrater, Aug 27 '17 at 09:53
Yes, that is correct. I have added an example above to illustrate this. — jav, Aug 27 '17 at 17:13

score 2 · Answer 2 · answered Jan 17 '20 at 12:49

I think the procedure is:

For each ƛ it creates x models ( x = nº of folds in which the data set has been splitted for cross-validation algorithm)
For each ƛ and each model x, it calculates the mean(error) and sd(error), thus, mean(x errors) and sd(x errors)
Let's say that we have ƛmin and serrorƛmin (calculated in step 2.). Now, ƛse is defined as "largest value of lambda such that error is within 1 standard error of the minimum". Then the condition for ƛse is:

ƛse in [ƛmin - seƛmin, ƛmin + seƛmin]
Then ƛse = max(ƛ),ƛ where which fulfills the condition above.

I can show you an example:

lasso_cv <- cv.glmnet(x = x, y= endpoint, alpha = 1, lambda = lambdas_to_try,
                  standardize = TRUE, nfolds = 10,type.measure="auc",
                  family="binomial")

Note that ƛmin is:

lasso_cv$lambda.min
[1] 0.007742637

And serrorƛmin is:

serrorlmin <- lasso_cv$cvsd[which(lasso_cv$lambda == lasso_cv$lambda.min)]
serrorlmin

[1] 0.01058009

Then, the range in which ƛse is chosen is:

rang <- c(lasso_cv$lambda.min - serrorlmin,lasso_cv$lambda.min + serrorlmin)
[1] -0.002837457  0.018322731

And to find it:

max(lasso_cv$lambda[lasso_cv$lambda>=rang[1] & lasso_cv$lambda<=rang[2]])
[1] 0.01629751

And this value matches with ƛse!

lasso_cv$lambda.1se # 0.01629751

I hope it helps!

lambda.1se not being in one standard error of the error

2 Answers2