0

this is some of the data the downs.bc data at faraway package

downs.bc 
    age     m  r
1  17.0 13555 16
2  18.5 13675 15
3  19.5 18752 16
4  20.5 22005 22

why do I get the following error? How to fix it?

mod1=glm(cbind(age, 30-age)~m+r, family=binomial, downs.bc)

Error in family$linkfun(mustart) : Value 1.03226 out of range (0, 1)

In addition: Warning message:

In eval(family$initialize) : non-integer counts in a binomial glm!

zephryl
  • 14,633
  • 3
  • 11
  • 30
  • You likely have an age that is greater than 30. Binomial models are used to model True/False, Success/failures. You are not modelleing any of those cases here. – Oliver Feb 09 '23 at 12:31
  • `glm(..., family = binomial, ...)` is function for logistic regression model. Your dependent variable should be `0` and `1`. – Park Feb 09 '23 at 12:32
  • 1
    @Park you can also a two-column matrix of counts on the left-hand side of a binomial glm. – Allan Cameron Feb 09 '23 at 12:45

1 Answers1

2

With a logistic regression using glm, the term on the left-hand side of the equation can either be a TRUE / FALSE (or 1 / 0) variable indicating presence / absence, or it can be a two-column matrix indicating the number of positive / negative cases.

From the ?glm documents:

For binomial and quasibinomial families the response can also be specified as a factor (when the first level denotes failure and all others success) or as a two-column matrix with the columns giving the numbers of successes and failures.

If we look at the description of the boot::downs.bc data set, it tells us that the variables are:

age The average age of all mothers in the age category.

m The total number of live births to mothers in the age category.

r The number of cases of Down's syndrome.

So the correct formula would be

mod <- glm(cbind(r, m - r) ~ age, family = binomial, data = boot::downs.bc)

Which results in the following model, showing a highly significant increase in the probability of Down's syndrome as maternal age increases:

summary(mod)
#> 
#> Call:
#> glm(formula = cbind(r, m - r) ~ age, family = binomial, data = boot::downs.bc)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -3.4127  -1.9446   0.5464   2.1361   4.7681  
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -10.563690   0.214485  -49.25   <2e-16 ***
#> age           0.137579   0.006474   21.25   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 625.21  on 29  degrees of freedom
#> Residual deviance: 184.03  on 28  degrees of freedom
#> AIC: 326.91
#> 
#> Number of Fisher Scoring iterations: 5

And we can see what this looks like using predict and plot:

plot(predict(mod, newdata = list(age = 16:50), type = 'response'), type = 'l',
     ylab = "Probability of Down's syndrome per live birth", 
     xlab = 'Maternal age')

enter image description here

Created on 2023-02-09 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • thank you and how would What is the effect of a 5 year increase in age according to the above model? – fashionable Feb 09 '23 at 14:18
  • According to this model, the **log odds** of having a baby with Down's syndrome is estimated by `-10.563690 + age * 0.137579` A five-year increase in age increases the **log odds** of having a child with Down's syndrome by 5 * 0.137579, or 0.687895. Another way of saying this is that the **odds** increase by a factor of `exp(0.687895)`, which is 1.989523. In other words, the **odds** of having a child with Down's syndrome approximately doubles for every additional 5 years of the mother's age. – Allan Cameron Feb 09 '23 at 14:24
  • I was wondering if you could guide me on how to find OR of it and the OD. What is the probability of the event for a 20 year old? for a 30 year old? Calculate the Odds ratio. is it like this?-10.563690 + 30 * 0.137579= -6.43632 -10.563690 + 40* 0.137579= -5.06053 – fashionable Feb 11 '23 at 13:57