0

I am running a logistic regression to see how these factors/variables affect an outcome (Neurological Complication).

How can I obtain the c-statistic -- also known as the area under the Receiver Operating Characteristic (AUROC) Curve?

    NeuroLogit2 <- glm(Neurologic Complication? ~ HTN + stroke + Gender + Embol + Drain, data=Tevar.new, family=binomial)
    summary(NeuroLogit2) 
bdg67
  • 33
  • 7

2 Answers2

1

Well, obviously I don't have your data, so let's make some up. Here, we'll pretend we're modelling the probability of people catching a cold in any given year based on age and sex. Our outcome variable is just a 1 for "caught a cold" and 0 for "didn't catch a cold"

set.seed(69)
outcome <- c(rbinom(1000, 1, seq(0.4, 0.6, length.out = 1000)),
             rbinom(1000, 1, seq(0.3, 0.5, length.out = 1000)))
sex     <- rep(c("M", "F"), each = 1000)
age     <- rep((601:1600)/20, 2)

df      <- data.frame(outcome, age, sex)

Now we'll create the model and have a look at it:

my_mod  <- glm(outcome ~ age + sex, data = df, family = binomial())

summary(my_mod)
#> 
#> Call:
#> glm(formula = outcome ~ age + sex, family = binomial(), data = df)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -1.3859  -1.0993  -0.8891   1.1847   1.5319  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -1.20917    0.18814  -6.427 1.30e-10 ***
#> age          0.01346    0.00317   4.246 2.18e-05 ***
#> sexM         0.61000    0.09122   6.687 2.28e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 2760.1  on 1999  degrees of freedom
#> Residual deviance: 2697.1  on 1997  degrees of freedom
#> AIC: 2703.1
#> 
#> Number of Fisher Scoring iterations: 4

Looks good. Older people and men are more likely to catch colds.

Now suppose we wanted to use this model to get a prediction of whether someone of a given age and sex will catch a cold in the next year. If we use the predict function with type = "response", we get a probability estimate for each of the people in our data frame based on their age and sex.

predictions <- predict(my_mod, type = "response")

We can use these probabilities to construct our ROC. Here I'll use the pROC package to help:

library(pROC)

roc(outcome, predictions)
#> Setting levels: control = 0, case = 1
#> Setting direction: controls < cases
#> 
#> Call:
#> roc.default(response = outcome, predictor = predictions)
#> 
#> Data: predictions in 1079 controls (outcome 0) < 921 cases (outcome 1).
#> Area under the curve: 0.6027

So the area under the ROC is 60.27%. We can plot the ROC itself to see what this looks like:

library(ggplot2)

ggroc(roc(outcome, predictions)) +
  theme_minimal() + 
  ggtitle("My ROC curve") + 
  geom_segment(aes(x = 1, xend = 0, y = 0, yend = 1), color="grey", linetype="dashed")
#> Setting levels: control = 0, case = 1
#> Setting direction: controls < cases

Created on 2020-06-07 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • for some reason the round function in "round(predict(my_mod, type = "response"))" is making everything 0s and the AUC is coming out as 0.5. is this supposed to happen? my variables are all categorial (1s and 0s) not continuous like the "age" example you used -- perhaps this changes things? – bdg67 Jun 07 '20 at 02:08
  • 1
    @bdg67 you don't need to have a binary outcome prediction. If you just want the AUC for the model you can remove the "round". I have updated my answer - apologies if I led you astray there. – Allan Cameron Jun 07 '20 at 04:23
  • That is much better! Now my AUC is 0.7951 - is this the same as the c-statistic? Also, running into one other problem that you might be able to provide some insight on. Using this for another variable (mortality): When I get to the roc(outcome, predictions) step, R gives me an error that says "Response and predictor must be vectors of the same length." even though they are both categorical. Any idea how to address this? – bdg67 Jun 07 '20 at 17:33
  • @bdg67 yes, c-statistic is the same as AUC. In your mortality data, you need a prediction for each patient and a measured outcome for the same patients, so the two vectors need to be the same length: one prediction and one outcome per patient – Allan Cameron Jun 07 '20 at 17:40
  • Yep, there were a few missing values that I had to fix. Last thing: How can I go about finding the 95% CI and Odds Ratios of the logistic regression you did? When I ran them using "exp(cbind(OR = coef(NeuroLogit2), confint(NeuroLogit2)))" (for your data, NeuroLogit2 would be my_mod) I kept getting weird values with e's. Any ideas? – bdg67 Jun 08 '20 at 15:35
  • values with e's --- also known as exponents haha – bdg67 Jun 08 '20 at 18:33
0

While there are indeed several contributed packages which compute this, it can also be accomplished via the survival package's concordance method. survival is a 'recommended' package which means it is likely already bundled with your R installation.

survival::concordance(NeuroLogit2)
B. Tyner
  • 83
  • 5