5

Not sure if this more of a statistics question but the closest similar problem I could find is here, although I couldn't get it to work for my case.

I am trying to develop a pooled, penalized logistic regression model. I used mice to create a mids object and then fit a model to each dataset using caret repeated cross-validation with elastic net regression (glmnet) to tune parameters. The fitted object is not of class "mira" but I think I fixed that by changing the object class with the right list items. The major issue is that glmnet does not have an associated vcov method, which is required by pool().

I would like to use penalized regression based on the amount of variables and uncertainty over which ones are the best predictors. My data consists of 4x numeric variables and 9x categorical variables of varying levels and I anticipate including interactions.

Does anyone know how I might be able to create my own vcov method or otherwise address this issue? I am not sure if this is possible.

Example data and code are enclosed, noting that I am not able to share the actual data.

library(mice)
library(caret)

dat <- as.data.frame(list(time=c(4,3,1,1,2,2,3,5,2,4,5,1,4,3,1,1,2,2,3,5,2,4,5,1),
                      status=c(1,1,1,0,2,2,0,0,NA,1,2,0,1,1,1,NA,2,2,0,0,1,NA,2,0),
                      x=c(0,2,1,1,NA,NA,0,1,1,2,0,1,0,2,1,1,NA,NA,0,1,1,2,0,1),
                      sex=c("M","M","M","M","F","F","F","F","M","F","F","M","F","M","M","M","F","F","M","F","M","F","M","F")))

imp <- mice(dat,m=5, seed=192)

control = trainControl(method = "repeatedcv", 
                          number = 10, 
                          repeats=3, 
                          verboseIter = FALSE)

mod <- list(analyses=vector("list", imp$m))

for(i in 1:imp$m){
  mod$analyses[[i]] <- train(sex ~ .,
                           data = complete(imp, i),
                           method = "glmnet",
                           family="binomial",
                           trControl = control,
                           tuneLength = 10, 
                           metric="Kappa")
}

obj <- as.mira(mod)

obj <- list(call=mod$analyses[[1]]$call, call1=imp$call, nmis=imp$nmis, analyses=mod$analyses)

oldClass(obj) <- "mira"

pool(obj)

Produces:

Error in pool(obj) : Object has no vcov() method.

jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
Jay
  • 51
  • 2
  • As I understand it, `glmnet` will not have a single variance-covariance matrix, but will rather have multiple models, each of which might have something like a `vcov`-result. I think you need to dig deeper into the underlying algorithm. So , yes, to your lead-in hypothesis ... I think this is more a methods/stats-question and am voting to migrate to CrossValidated.com – IRTFM Apr 15 '18 at 01:14

0 Answers0