When conducting logistic regression, I believe that it's good practice is to use McFadden's or Tjur's R2, since both of those tend to be used with generalized linear models. mice::pool.r.squared
is designed to be only with lm
models. A previous StackOverflow user had the same question as you and it seems that the best function for a multiply-imputed glm()
model is mfc()
from the Github package glmice
. The function looks fairly simple and uses McFadden's R2, although the package hasn't been touched for a few years. That previous user wasn't able to get mfc()
to work, but it worked for me.
# install.packages("remotes")
# remotes::install_github("noahlorinczcomi/glmice")
library(glmice)
library(mice)
data(nhanes)
nhanes$hyp <- ifelse(nhanes$hyp == 2, 1, 0)
imp <- mice(nhanes, m = 10, seed = 500, printFlag = FALSE)
mod <- with(imp, glm(hyp ~ age + bmi, family = "binomial"))
# summary(pool(mod))
mcf(mod)
#> [1] "34.9656%"
It looks like there are fewer resources on calculating AUC for a multiply-imputed glm()
. I did find a vignette from the finalfit
package, which calculated area under the curve. Unfortunately, it calculated AUC for each imputation. There might be a way to pool the output, but I'm not sure how (hopefully another SO user might suggest an idea?).
library(finalfit)
mod %>%
getfit() %>%
purrr::map(~ pROC::roc(.x$y, .x$fitted)$auc)
# not pasting the output because it's a lot
small update
As of 1/23/23, I've noticed that the glmice
Github page has been taken down. I'm posting the mcf()
function here for reference.
#' Calculates McFadden's Pseudo R-Squared
#'
#' Returns McFadden's pseudo r-squared for logistic regression models performed on 'mice'-imputed data sets.
#' @param model a logit model from which you would like to return McFadden's pseudo r-squared. This can be a model created either with 'glm.mids()' or 'with()'
#' @return mcfs2: McFadden's pseudo r-squared
#' @export
mcf <- function (model) {
iterations <- model$call1$m
null_ds <- as.numeric()
res_ds <- as.numeric()
for (i in 1:iterations) {
null_ds[i] <- model$analyses[[i]]$null.deviance
res_ds[i] <- model$analyses[[i]]$deviance
}
ds <- cbind(as.numeric(null_ds), as.numeric(res_ds))
m_null <- mean(null_ds)
m_res <- mean(res_ds)
mcfs <- round(((1 - (m_res / m_null)) * 100), 4)
mcfs <- paste0(mcfs, "%")
# end
return(mcfs)
}