I am working through the practical sheet at this link: http://www.statslab.cam.ac.uk/~rds37/teaching/statistical_modelling/Practical6.pdf
In exercise 3, it is stated that to test whether the mother's myopia and father's myopia is equally significant, we must create a new variable
mumORdadMyopic <- (dadMyopic == "Yes") | (mumMyopic == "Yes")
and then fit the model with the variables dadMyopic, mumORdadMyopic. Using the previous exercises as a guide, we'd then perorm
data_1 <- myopic %>% dplyr::select(-compH, -TVHR)
data_2 <- myopic %>% dplyr::select(-compH, -TVHR, -mumMyopic) %>% cbind(mumORdadMyopic))
model_1 <- glm(myopic ~ ., data = data_1, family = binomial)
model_2 <- glm(myopic ~ ., data = data_2, family = binomial)
anova(model_1, model_2, test = "LR")
My question is this:
With the aim of testing whether mumMyopic and dadMyopic are equally significant variables, why is the above what we want to perform? I would have thought that we would want to fit one model without mumMyopic, and one without dadMyopic, and compare the performance of the models.
newdata_1 <- myopic %>% dplyr::select(-compH, -TVHR, -dadMyopic)
newdata_2 <- myopic %>% dplyr::select(-compH, -TVHR, -mumMyopic)
newmodel_1 <- glm(myopic ~ ., data = newdata_1, family = binomial)
newmodel_2 <- glm(myopic ~ ., data = newdata_2, family = binomial)
anova(newodel_1, newmodel_2, test = "LR")
where if newmodel_1 and newmodel_2 are significantly different (in a statistical sense), we could reject the hypothesis that mumMyopia and dadMyopia have the same predictive power.
Can someone explain why my suggested approach doesn't achieve what I want, and why the exercise's intended approach is the right one?
Thanks!