4

The documentation for the glm() function states, regarding a factor response variable, that

the first level denotes failure and all others success.

I assume caret's train() function calls glm() under the hood when using method = 'glm', and therefore the same applies.

So in order to produce an interpretable model that is consistent with other models (i.e. the coefficients correspond to a success event), I must follow this convention.

The problem is that, even though glm(), and thus caret's train() function treats the second level factor as a success, caret's resamples function (and $resample variable) still treat the first level as success / positive, and therefore sensitivity and specificity are the opposite of what they should be if i want to use resamples() to compare against other models..

Example:

install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)

train_control <- trainControl(
    summaryFunction = twoClassSummary,
    classProbs = TRUE,
    method = 'repeatedcv',
    number = 5,
    repeats = 5,
    verboseIter = FALSE,
    savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

I can see the correct sens/spec in confusionMatrix with positive = 'Yes' but what is the solution for resamples() so that I can accurately compare it against other models?

jmuhlenkamp
  • 2,102
  • 1
  • 14
  • 37
shaneker
  • 375
  • 2
  • 8

1 Answers1

0

The following will invert the sensitivity:

temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)

caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model 

Based on page 272 of the book Applied Predictive Modeling;

The glm() function models the probability of the second factor level, so the function relevel() is used to temporarily reverse the factors levels.

user1420372
  • 2,077
  • 3
  • 25
  • 42