I have built a logistic regression with polynomial relationship with variable age
, which I include in the model as an interaction with variable sex
. I also have other independent and control variables.
The main question is: How do I find a turning point for age
variable for female
and male
separately?
The model looks like this:
model <- glm(success ~ poly(age,2, raw = TRUE):sex + ...,
data = data, family = "binomial")
summary(model)
...
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.701e+00 1.525e+00 -6.362 1.99e-10 ***
poly(age, 2, raw = TRUE)1:sexmixed 4.245e-02 3.492e-02 1.216 0.224112
poly(age, 2, raw = TRUE)2:sexmixed -2.285e-05 4.216e-04 -0.054 0.956773
poly(age, 2, raw = TRUE)1:sexfemale 1.487e-01 3.608e-02 4.122 3.76e-05 ***
poly(age, 2, raw = TRUE)2:sexfemale -1.937e-03 4.527e-04 -4.278 1.88e-05 ***
poly(age, 2, raw = TRUE)1:sexmale 6.715e-02 3.116e-02 2.155 0.031164 *
poly(age, 2, raw = TRUE)2:sexmale -4.762e-04 3.179e-04 -1.498 0.134089
---
The dependent variable represents success
. mixed
is a reference level for sex
(here mixed refers to observations that are related to group of people combining both males and females, but I am only interested to find a turning point for female
and male
).
From the summary I can calculate:
- the turning point of age for female: 0.1487/(2*0.001937)=38 years old
- the turning point of age for male: 0.06715/(2*0.0004762)=70.5 years old.
But it doesn't seem to be right, because when I visualise it below:
ggplot(data %>% filter(sex %in% c('female', 'male')),
aes(age, success, color = sex)) +
geom_point(position = position_jitter(height = 0.007, width = 0)) +
stat_smooth(method = "glm", method.args = list(family = binomial),
formula = y ~ poly(x,2), alpha = 0.2, size=0.5, aes(fill = sex))
I get a plot, showing that turning point for men must be at ~58 years old (for female at ~36).
If I build model including sex
separately, I receive this:
model <- glm(success ~ poly(age,2, raw = TRUE):sex + sex + ...,
data = data, family = "binomial")
summary(model)
...
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.799e+01 4.511e+00 -3.988 6.67e-05 ***
poly(age, 2, raw = TRUE)1:sexmixed 3.634e-01 1.675e-01 2.169 0.030084 *
poly(age, 2, raw = TRUE)2:sexmixed -2.969e-03 1.593e-03 -1.864 0.062353 .
poly(age, 2, raw = TRUE)1:sexfemale -2.533e-02 5.670e-02 -0.447 0.655083
poly(age, 2, raw = TRUE)2:sexfemale -8.760e-05 6.261e-04 -0.140 0.888740
poly(age, 2, raw = TRUE)1:sexmale 8.733e-02 3.335e-02 2.619 0.008829 **
poly(age, 2, raw = TRUE)2:sexmale -6.543e-04 3.369e-04 -1.942 0.052097 .
---
For female it reproduces an error. The turning point of age for male: 0.08733/(2*0.0006543)=66.7 years old (also wrong).
If I include the age
also in the model, it gives:
model <- glm(success ~ poly(age,2, raw = TRUE)*sex...,
data = data, family = "binomial")
summary(model)
...
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.799e+01 4.511e+00 -3.988 6.67e-05 ***
poly(age, 2, raw = TRUE)1:sexfemale -3.887e-01 1.753e-01 -2.218 0.026565 *
poly(age, 2, raw = TRUE)2:sexfemale 2.881e-03 1.695e-03 1.699 0.089244 .
poly(age, 2, raw = TRUE)1:sexmale -2.760e-01 1.691e-01 -1.632 0.102586
poly(age, 2, raw = TRUE)2:sexmale 2.314e-03 1.612e-03 1.436 0.150994
---
- the turning point of age for female: 0.3887/(2*0.002881)=67.5 years old (wrong)
- the turning point of age for male: 0.276/(2*0.002314)=59.6 years old (somewhat close to visualisation assumption)
How do I need to use intercept
or the base level coefficient (mixed
) for finding the turing points for female
and male
? Also should I include the age
and/or sex
with interaction term in my case?