1

I am trying to plot the result of a logistic regression in the log odd scale.

load(url("https://github.com/bossaround/question/raw/master/logisticregressdata.RData"))

ggplot(D, aes(Year, as.numeric(Vote), color = as.factor(Male))) +
  stat_smooth( method="glm", method.args=list(family="binomial"), formula = y~x + I(x^2), alpha=0.5, size = 1, aes(fill=as.factor(Male))) +
  xlab("Year") 

but this plot is on a 0~1 scale. I guess this is the probability scale (correct me if I am wrong)?

What I really want is to plot it on a log odd scale just as the logistic regression reports before converting it to probability.

Ideally, I want to plot the relationship between Vote and Year, by Male, after controlling for Foreign, in a model like this:

   Model <-  glm(Vote ~ Year + I(Year^2) + Male + Foreign, family="binomial", data=D)

I could manually draw the line based on summary(Model), but I also want to plot the confidence interval.

Something like the image on page 44 of this document I found online: http://www.datavis.ca/papers/CARME2015-2x2.pdf. Mine would have a quadratic curve.

Thank you!

Chuck C
  • 153
  • 2
  • 12
  • 1
    `predict(Model, type="link", se.fit=TRUE)` will give you predictions on the log odds scale. `se.fit=TRUE` will include the standard error of the predictions. The predictions will be for the observations used to fit the model. To get predictions at other values of the independent variables (IVs), use the `newdata` argument of predict and include a new data frame (that you create) with the IV values at which you want predictions of the outcome (for example, `predict(Model, newdata=my_data_frame, type="link", se.fit=TRUE)`). – eipi10 Sep 28 '17 at 05:39
  • 1
    One you have the predictions, you can plot them using `geom_line` in ggplot. Use `geom_ribbon` to plot the confidence interval. – eipi10 Sep 28 '17 at 05:40

3 Answers3

3

To plot the predictions of a model with several variables, one should make the model, predict on new data to generate predictions and plot that

Model <-  glm(Vote ~ Year + I(Year^2) + Male + Foreign, family="binomial", data=D)
for_pred = expand.grid(Year = seq(from = 2, to = 10, by = 0.1), Male = c(0,1), Foreign = c(0,1)) #generate data to get a smooth line

for_pred = cbind(for_pred, predict(Model, for_pred, type = "link", se.fit= T)) 
#if the probability scale was needed: `type = "response`

library(ggplot2)
ggplot(for_pred, aes(Year, fit, color = as.factor(Male))) +
  geom_line() +
  xlab("Year")+
  facet_wrap(~Foreign)  + #important step - check also how it looks without it
  geom_ribbon(aes(ymax = fit + se.fit, ymin = fit - se.fit, fill = as.factor(Male)), alpha = 0.2) 

#omit the color by `color = NA` or by `inherit.aes = F` (if like this, one should provide the data and full `aes` mapping for  geom_ribbon). 
#If geom_ribbon should not have a mapping, specify `fill` outside of `aes` like: `fill = grey80`.

enter image description here

Check out library sjPlot

missuse
  • 19,056
  • 3
  • 25
  • 47
2

A further answer using augmented() from broom():

Model <-  glm(Vote ~ Year + I(Year^2) + Male + Foreign, family="binomial", data=D)
summary(Model)


# augmented data frame

model.df = augment(Model) %>% rename(log_odds = `.fitted`, 
                                       Sex = Male)

glimpse(model.df.1)
Observations: 46,398
Variables: 13
$ .rownames  <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21"...
$ Vote       <dbl> 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, ...
$ Year       <int> 2, 3, 4, 5, 2, 3, 2, 3, 4, 5, 6, 2, 2, 2, 3, 4, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4, 5, 2, 3, 2, 3, 4, 5, 2, 3, 4, 5, ...
$ I.Year.2.  <S3: AsIs>  4,  9, 16, 25,  4,  9,  4,  9, 16, 25, 36,  4,  4,  4,  9, 16,  4,  9, 16, 25, 36, 49, 64, 81,  4,  9, 16, 2...
$ Sex        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
$ Foreign    <int> 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ log_odds   <dbl> -0.01910985, -0.68184753, -1.14317053, -1.40307885, 0.26930939, -0.39342829, -0.01910985, -0.68184753, -1.14317053...
$ .se.fit    <dbl> 0.01675017, 0.01466136, 0.01790972, 0.02058514, 0.01931826, 0.01777691, 0.01675017, 0.01466136, 0.01790972, 0.0205...
$ .resid     <dbl> -1.1693057, -0.9047053, -0.7439452, 1.8016037, -1.2937083, 1.3483961, -1.1693057, -0.9047053, -0.7439452, -0.66303...
$ .hat       <dbl> 7.013561e-05, 4.794678e-05, 5.879536e-05, 6.711744e-05, 9.162739e-05, 7.602458e-05, 7.013561e-05, 4.794678e-05, 5....
$ .sigma     <dbl> 1.124879, 1.124884, 1.124886, 1.124861, 1.124876, 1.124874, 1.124879, 1.124884, 1.124886, 1.124887, 1.124860, 1.12...
$ .cooksd    <dbl> 1.376354e-05, 4.849628e-06, 3.749311e-06, 5.461011e-05, 2.399355e-05, 2.253792e-05, 1.376354e-05, 4.849628e-06, 3....
$ .std.resid <dbl> -1.1693467, -0.9047270, -0.7439671, 1.8016642, -1.2937676, 1.3484474, -1.1693467, -0.9047270, -0.7439671, -0.66305...


#visualise

        ggplot(model.df.1, aes(Year, log_odds, colour = Sex)) + 
        geom_line() + 
        geom_smooth(se = TRUE) +
       facet_wrap( ~ Foreign)

Which gives:

enter image description here

Edu
  • 903
  • 6
  • 17
0

Your approach is correct but you need to predict the values with the model you have built something like this:

ModelPredictions <- predict(Model , type="response")

After that, you can plot using ggplot:

ggplot(D, aes(x=ModelPredictions , y=D$Vote )) +
  geom_point()  +  stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial)) +  facet_wrap( ~ Foreign)
double-beep
  • 5,031
  • 17
  • 33
  • 41