2

I am using tidymodels to develop a logistic regression using the Palmer's Penguins dataset.

I believe that female should be the reference class since it is the first level of my outcome variable, and therefore a positive outcome for the model should female.

When I've built the model, the coefficient estimates seem to indicate that an increase in bill_depth_mm and bill_length_mm result in an increase in the likelihood of being a female, but when plotted the reverse relationship appears to be true.

So this would lead me to believe that male is the predicted outcome for the model, however when it comes to calculating specificity and sensitivity it treats the female outcome as the true positive outcome. Similarly, an ROC plot needs the reference level to be the predicted likelihood of a female outcome.

Why does it appear as though the estimates from the model output are reversed? Is there an option in tidymodels to specify what the estimates should relate to? Or even confirm what the estimates are related to?

Or is my understanding of the model incorrect and these estimates are correct - an increase in bill length and depth does increase the likelihood of the penguin being female?

penguins <- read_csv("Data/penguins.csv")

penguins <- penguins %>% 
  mutate(sex = factor(sex))

set.seed(57475)
penguins_splits <- initial_split(penguins,
                                 strata = sex)

penguins_training <- training(penguins_splits) 
penguins_testing <- testing(penguins_splits)

penguins_recipe <- recipe(sex ~ ., data = penguins_training) %>% 
  step_naomit(all_predictors()) %>%
  step_rm(rowid, skip = TRUE) %>%
  step_string2factor(all_nominal_predictors()) %>%
  step_mutate(year = factor(year)) %>%
  step_dummy(all_nominal_predictors())


logistic_model <- logistic_reg() %>% 
  set_engine("glm") %>%
  set_mode("classification")

logistic_wf <- workflow() %>% 
  add_model(logistic_model) %>% 
  add_recipe(penguins_recipe)

logistic_fit <- logistic_wf %>%
  fit(data = penguins_training)

logistic_fit %>%
  pull_workflow_fit %>%
  tidy()

ggplot(penguins_training, aes(x = bill_length_mm, colour = sex)) +
  geom_boxplot()

Output from the model object

> logistic_fit %>%
+   pull_workflow_fit %>%
+   tidy()
# A tibble: 11 x 5
   term               estimate std.error statistic      p.value
   <chr>                 <dbl>     <dbl>     <dbl>        <dbl>
 1 (Intercept)       -76.7      13.7        -5.58  0.0000000237
 2 bill_length_mm      0.831     0.176       4.72  0.00000238  
 3 bill_depth_mm       1.53      0.391       3.91  0.0000913   
 4 flipper_length_mm  -0.0290    0.0611     -0.474 0.635       
 5 body_mass_g         0.00593   0.00132     4.49  0.00000724  
 6 species_Chinstrap  -9.17      2.06       -4.45  0.00000843  
 7 species_Gentoo     -8.82      3.22       -2.74  0.00608     
 8 island_Dream        0.874     0.968       0.903 0.367       
 9 island_Torgersen   -0.237     0.974      -0.243 0.808       
10 year_X2008         -0.239     0.701      -0.341 0.733       
11 year_X2009         -0.762     0.736      -1.04  0.300  
  • If female is the reference level (as it should be by default because alphabetical order is used to define reference level), it is coded as 0, and male is coded as 1. Thus the positive outcomes or "successes" are the males, not the females. A positive coefficient estimate for a variable means that as that variable increases, the log odds of being male also increases. – qdread Apr 28 '23 at 10:29
  • @qdread Thanks, that's what I initially thought as well, however when you look at "Specificity" and "Sensitivity" or the "ROC Curve" they treat the "female" outcome as the success. This is what's causing my confusion! It seems to change what the success is between the metric tests and the model build. – Dominic Foy Apr 28 '23 at 10:40

1 Answers1

4

You're correct in our interpretations - this is due to a clash in conventions between the underlying engine (here stats::glm()) and yardstick. The model uses the first level as "failure" while yardstick uses the first level as "success". To alter the behavior of yardstick, set event_level = "second". (To alter the behavior of glm(), recode the factor.)

https://yardstick.tidymodels.org/reference/sens.html#relevant-level

  • 1
    Well, that's good to know. I don't see why yardstick doesn't follow the default of `glm` though. The documentation you linked to states "There is no common convention on which factor level should automatically be considered the "event" or "positive" result when computing binary classification metrics." Maybe not, but then why is the default set to the opposite of what `glm` and many other modeling functions use? – qdread Apr 28 '23 at 11:58