0

I am currently working with the lasso for feature selection. First I perform a 10-fold crossvalidation to find the shrinkage parameter with the lowest MSE. I now try to calculate the MSE of the trainings-set myself, however, this value does not fit with the cv-plot.

cv <- cv.glmnet(as.matrix(mtcars[,c(1,3:9)]), mtcars[,c(2)], alpha=1, nfolds=10, type.measure="mse")
plot(cv)

lasso.mod <- glmnet(as.matrix(mtcars[,c(1,3:9)]),mtcars[,c(2)],alpha=1,lambda=cv$lambda.min)
y <- predict(lasso.mod, s=cv$lambda.min, newx=as.matrix(mtcars[,c(1,3:9)]))
mean((mtcars[,c(2)]-y)^2) # calculate MSE

What is the difference between the formula above and below? The formula below was said to provide the MSE of the lasso, but why are both values not identical? To be precise, I use the same dataset for the crossvalidation as for the calculation of the MSE.

cv$cvm[cv$lambda == cv$lambda.min]   
ELHL
  • 137
  • 2
  • 12
  • 3
    What exactly is your problem? Is it that the MSE you calculated is lower than the CV MSE? I think this result should be expected, since you compare out-of sample fit with in-sample fit. But this question is about statistics and not programming. – Alex Jun 10 '17 at 21:59

1 Answers1

1

The cross-validation MSE should not be equal to MSE of the whole training data set, because they are totally two different conceptions.

Cross-validation MSE for a certain lambda is: if you divide the training data set into 10 parts, do the following for each part: fit the lasso model using the lambda and 9 other parts and calculate MSE on the part, and calculate average for the 10 MSEs you've got. This is the cross-validation MSE and it's totally different with MSE on training data sets.

Consistency
  • 2,884
  • 15
  • 23
  • I did not expect the results to be that different, because the data are in both cases "equal". But I totally understand your answer. I did some additional research and came across those posts here. Might be helpful for further researchers: https://stackoverflow.com/questions/39482436/why-calculating-mse-in-lasso-regression-gives-different-outputs https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation – ELHL Jun 11 '17 at 10:52