0

I've read a few Q&As about this, but am still not sure I understand, why the coefficients from glmnet and caret models based on the same sample and the same hyper-parameters are slightly different. Would greatly appreciate an explanation!

I am using caret to train a ridge regression:

library(ISLR)
Hitters = na.omit(Hitters)
x = model.matrix(Salary ~ ., Hitters)[, -1] #Dropping the intercept column.
y = Hitters$Salary

set.seed(0)
train = sample(1:nrow(x), 7*nrow(x)/10)

library(caret)
set.seed(0)
train_control = trainControl(method = 'cv', number = 10)
grid = 10 ^ seq(5, -2, length = 100)
tune.grid = expand.grid(lambda = grid, alpha = 0)
ridge.caret = train(x[train, ], y[train],
                    method = 'glmnet',
                    trControl = train_control,
                    tuneGrid = tune.grid)
ridge.caret$bestTune
# alpha is 0 and best lambda is 242.0128

Now, I use the lambda (and alpha) found above to train a ridge regression for the whole data set. At the end, I extract the coefficients:

ridge_full <- train(x, y,
                    method = 'glmnet',
                    trControl = trainControl(method = 'none'), 
                    tuneGrid = expand.grid(
                      lambda = ridge.caret$bestTune$lambda, alpha = 0)
                    )
coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)

Finally, using exactly the same alpha and lambda, I try to fit the same ridge regression using glmnet package - and extract coefficients:

library(glmnet)
ridge_full2 = glmnet(x, y, alpha = 0, lambda = ridge.caret$bestTune$lambda)
coef(ridge_full2)
user3245256
  • 1,842
  • 4
  • 24
  • 51

1 Answers1

2

The reason is the fact the exact lambda you specified was not used by caret. You can check this by:

ridge_full$finalModel$lambda

closest values are 261.28915 and 238.07694.

When you do

coef(ridge_full$finalModel, s = ridge.caret$bestTune$lambda)

where s is 242.0128 the coefficients are interpolated from the coefficients actually calculated.

Wheres when you provide lambda to the glmnet call the model returns exact coefficients for that lambda which differ only slightly from the interpolated ones caret returns.

Why this happens:

when you specify one alpha and one lambda for a fit on all of the data caret will actually fit:

   fit = function(x, y, wts, param, lev, last, classProbs, ...) {
                    numLev <- if(is.character(y) | is.factor(y)) length(levels(y)) else NA

                    theDots <- list(...)

                    if(all(names(theDots) != "family")) {
                      if(!is.na(numLev)) {
                        fam <- ifelse(numLev > 2, "multinomial", "binomial")
                      } else fam <- "gaussian"
                      theDots$family <- fam
                    }

                    ## pass in any model weights
                    if(!is.null(wts)) theDots$weights <- wts

                    if(!(class(x)[1] %in% c("matrix", "sparseMatrix")))
                      x <- Matrix::as.matrix(x)

                    modelArgs <- c(list(x = x,
                                        y = y,
                                        alpha = param$alpha),
                                   theDots)

                    out <- do.call(glmnet::glmnet, modelArgs)
                    if(!is.na(param$lambda[1])) out$lambdaOpt <- param$lambda[1]
                    out
                  }

this was taken from here.

in your example this translates to

fit <- glmnet::glmnet(x, y,
                       alpha = 0)

lambda <- unique(fit$lambda)

these lambda values correspond to ridge_full$finalModel$lambda:

all.equal(lambda, ridge_full$finalModel$lambda)
#output
TRUE
missuse
  • 19,056
  • 3
  • 25
  • 47