0

I have built a logistic regression with polynomial relationship with variable age, which I include in the model as an interaction with variable sex. I also have other independent and control variables.

The main question is: How do I find a turning point for age variable for female and male separately?

The model looks like this:

model <- glm(success ~ poly(age,2, raw = TRUE):sex + ..., 
               data = data, family = "binomial")
summary(model)
...
Coefficients:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                         -9.701e+00  1.525e+00  -6.362 1.99e-10 ***
poly(age, 2, raw = TRUE)1:sexmixed   4.245e-02  3.492e-02   1.216 0.224112    
poly(age, 2, raw = TRUE)2:sexmixed  -2.285e-05  4.216e-04  -0.054 0.956773    
poly(age, 2, raw = TRUE)1:sexfemale  1.487e-01  3.608e-02   4.122 3.76e-05 ***
poly(age, 2, raw = TRUE)2:sexfemale -1.937e-03  4.527e-04  -4.278 1.88e-05 ***
poly(age, 2, raw = TRUE)1:sexmale    6.715e-02  3.116e-02   2.155 0.031164 *  
poly(age, 2, raw = TRUE)2:sexmale   -4.762e-04  3.179e-04  -1.498 0.134089    
---

The dependent variable represents success. mixed is a reference level for sex (here mixed refers to observations that are related to group of people combining both males and females, but I am only interested to find a turning point for female and male).

From the summary I can calculate:

  • the turning point of age for female: 0.1487/(2*0.001937)=38 years old
  • the turning point of age for male: 0.06715/(2*0.0004762)=70.5 years old.

But it doesn't seem to be right, because when I visualise it below:

ggplot(data %>% filter(sex %in% c('female', 'male')), 
       aes(age, success, color = sex)) + 
  geom_point(position = position_jitter(height = 0.007, width = 0)) +
  stat_smooth(method = "glm", method.args = list(family = binomial),  
              formula = y ~ poly(x,2), alpha = 0.2, size=0.5, aes(fill = sex))

I get a plot, showing that turning point for men must be at ~58 years old (for female at ~36).

If I build model including sex separately, I receive this:

model <- glm(success ~ poly(age,2, raw = TRUE):sex + sex + ..., 
               data = data, family = "binomial")
summary(model)
...
Coefficients:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                         -1.799e+01  4.511e+00  -3.988 6.67e-05 ***
poly(age, 2, raw = TRUE)1:sexmixed   3.634e-01  1.675e-01   2.169 0.030084 *  
poly(age, 2, raw = TRUE)2:sexmixed  -2.969e-03  1.593e-03  -1.864 0.062353 .  
poly(age, 2, raw = TRUE)1:sexfemale -2.533e-02  5.670e-02  -0.447 0.655083    
poly(age, 2, raw = TRUE)2:sexfemale -8.760e-05  6.261e-04  -0.140 0.888740    
poly(age, 2, raw = TRUE)1:sexmale    8.733e-02  3.335e-02   2.619 0.008829 ** 
poly(age, 2, raw = TRUE)2:sexmale   -6.543e-04  3.369e-04  -1.942 0.052097 .  
---

For female it reproduces an error. The turning point of age for male: 0.08733/(2*0.0006543)=66.7 years old (also wrong).

If I include the age also in the model, it gives:

model <- glm(success ~ poly(age,2, raw = TRUE)*sex..., 
               data = data, family = "binomial")
summary(model)
...
Coefficients:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                         -1.799e+01  4.511e+00  -3.988 6.67e-05 ***
poly(age, 2, raw = TRUE)1:sexfemale -3.887e-01  1.753e-01  -2.218 0.026565 *  
poly(age, 2, raw = TRUE)2:sexfemale  2.881e-03  1.695e-03   1.699 0.089244 .  
poly(age, 2, raw = TRUE)1:sexmale   -2.760e-01  1.691e-01  -1.632 0.102586    
poly(age, 2, raw = TRUE)2:sexmale    2.314e-03  1.612e-03   1.436 0.150994    
---
  • the turning point of age for female: 0.3887/(2*0.002881)=67.5 years old (wrong)
  • the turning point of age for male: 0.276/(2*0.002314)=59.6 years old (somewhat close to visualisation assumption)

How do I need to use intercept or the base level coefficient (mixed) for finding the turing points for female and male? Also should I include the age and/or sex with interaction term in my case?

Saina V
  • 9
  • 4
  • Can you make your question reproducible by posting your data? – Otto Kässi Oct 24 '21 at 11:12
  • The difference could be due to the fact that you filter out everything where gender is neither male or female in the visualization, but not in the model. Try data=data[data$sex %in% c(‘female’,’male’),] – Otto Kässi Oct 24 '21 at 11:17
  • filtering the data also doesn't bring the expected result, what else could cause the issue? – Saina V Oct 24 '21 at 20:51
  • Please share your data using ?dput(); otherwise we are just guessing. – Otto Kässi Oct 25 '21 at 06:36

1 Answers1

2

Your question does not include a reproducible example, which makes answering somewhat difficult, and will reduce the usefulness of any answer you might get.

Nonetheless, we generally know from high school algebra, that a parabola has a turning point when its first derivative turns from positive to negative, i.e. when Age > -2a/b, where a is the coefficient on Age and b is the coefficient on Age^2.

To estimate separate a and b for males and females, you will need to include an interaction term between age and sex, which might look something line

glm(formula = success ~ sex + age:sex + I(age^2):sex, family = binomial(link = "logit"),
    data = data) -> model

You can could pick up the relevant values for a and b from coef(model).

Otto Kässi
  • 2,943
  • 1
  • 10
  • 27