0

I am trying to plot a binary regression line with ggplot and a real dataset for practice. The question: Is distance in kilometers a predictor for choosing a car as means of transport to get to the soccer stadium.

The variable A2 was dichotomized (1 = Auto (car) and 0 = kein Auto (no car)) and is now called A2_auto

dataset %>%
mutate(A2_auto = car::recode(.$A2,
"1 = 1; 2:9 = 0", 
as.factor = FALSE)) -> dataset

dataset$A2_auto <- factor(dataset$A2_auto, labels = c("kein Auto",
                                                        "Auto"))
 

After I have calculated the coefficients of determination (significant but very low Odd-Ratio) I wanted to plot the regression curve with ggplot:

ggplot(data=dataset, aes(x=A21, y=A2_auto)) + 
  geom_point(alpha=.5) +
  stat_smooth(method="glm.fit", se=FALSE, method.args = list(family=binomial))

But I receive a Warning Message:

>`geom_smooth()` using formula 'y ~ x'
Warnmeldung:
Computation failed in `stat_smooth()`:
Unused Argument (data = data) 

There is no regression line in the Scatterplot. Can't figure out why:

Binary regression plot

Here is the structure of the dataframe:

'data.frame':   689 obs. of  3 variables:
 $ A2     : dbl+lbl [1:689] 1, 1, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 3, 6, 6, 6, 6, 6, 6, 6...
   ..@ label        : chr "Mit welchem Verkehrsmittel legen Sie die größte Distanz zum Stadion zurück, wenn Sie ein Bundesliga-Heimspiel b"| __truncated__
   ..@ format.spss  : chr "F40.0"
   ..@ display_width: int 0
   ..@ labels       : Named num  1 2 3 4 5 6 7 8 9
   .. ..- attr(*, "names")= chr [1:9] "PKW" "Bahn (Fernverkehr)" "Bahn (Nahverkehr)" "Fernbus" ...
 $ A21    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "Distanz in km"
  ..- attr(*, "format.spss")= chr "F8.2"
  ..- attr(*, "display_width")= int 0
 $ A2_auto: Factor w/ 2 levels "kein Auto","Auto": 2 2 1 1 1 1 1 1 1 1 ...

Thanks for the help!

Edit1: Here is the output of dput(head(dataset,50)):

structure(list(A2 = structure(c(1, 1, 6, 6, 6, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 6, 6, 6, 6, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 7, 7, 3, 6, 6, 6, 6, 6, 6), label = "Mit welchem Verkehrsmittel legen Sie die größte Distanz zum Stadion zurück, wenn Sie ein Bundesliga-Heimspiel besuchen? - Selected Choice", format.spss = "F40.0", display_width = 0L, labels = c(PKW = 1, 
`Bahn (Fernverkehr)` = 2, `Bahn (Nahverkehr)` = 3, Fernbus = 4, 
`Fan-/Reisebus` = 5, ÖPNV = 6, Fahrrad = 7, `Zu Fuß` = 8, Sonstige = 9
), class = c("haven_labelled", "vctrs_vctr", "double")), A21 = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 
6, 6, 6, 6, 6, 6, 6), A2_auto = structure(c(2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("kein Auto", 
"Auto"), class = "factor")), row.names = c(NA, 50L), class = "data.frame")

When I change the model from glm.fit to glm another Warning Message is coming up:

ggplot(data=dataset, aes(x=A21, y=A2_auto)) + 
  geom_point(alpha=.5) +
  stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial))

Output:

`geom_smooth()` using formula 'y ~ x'
Warnmeldungen:
1: glm.fit: algorithm did not converge 
2: Computation failed in `stat_smooth()`:
y values must be 0 <= y <= 1 

I also dichotomized the variable to 0 and 1 (without factors) and the same error occurs:

dataset %>%
  mutate(A2_auto = car::recode(.$A2,
  "1 = 1; 2:9 = 0", 
  as.factor = TRUE)) -> dataset
`geom_smooth()` using formula 'y ~ x'
Warnmeldungen:
1: glm.fit: algorithm did not converge 
2: Computation failed in `stat_smooth()`:
y values must be 0 <= y <= 1 

Dichotomized Variable Scatter Plot

I`ll try to make my example reproducible with mtcars as suggested in the comments.

pmyth
  • 25
  • 4
  • 1
    I don't think `glm.fit` is a valid value for the `method` parameter. Did you mean `glm`? – dario Oct 20 '21 at 10:26
  • And could you make your example reproducible? Easiest way would be to rewrite your question and use one of the base datasets (e.g. `mtcars`) – dario Oct 20 '21 at 10:28
  • 1
    Please also not only share the results of str(), but use dput(head(yourdata,20)) instead and paste the result into your question. – deschen Oct 20 '21 at 10:28
  • I think you need to change `glm.fit` to `glm` and make sure that your response variable y is between 0 and 1. – dario Oct 20 '21 at 10:34
  • I have added your suggestions in the Edit, but with glm there comes another error, even when y is between 0 and 1. I will try to reproduce the error with mtcars. Maybe I can share my dataset somehow otherwise? @deschen – pmyth Oct 20 '21 at 11:34

1 Answers1

0

I think I have found the solution. After a comparison with the data set mtcars, I took a closer look at the variable A2_auto in my data set and realized that the variable was not numeric after all. So I converted it again and dichotomized it. Also, "glm" was the correct method as described in the comments. Thanks again for the advice in the comments! It worked now.

pmyth
  • 25
  • 4