I am trying to model a logistic regression with a couple of variables. I see that one of my variables has a quadratic trend, by plotting response by that variable and fitting a loess curve on it. So, I want to add a quadratic term to my logistic regression model, to model this variable with a quadratic trend. I'm having some trouble figuring out how to do this in the best / most accurate way.
Ex below:
Create df:
set.seed(1)
df <- data.frame(response = c(rep(0,times=30),rep(1,times=20)),
var1 = runif(50,min=12,max=30),
var2 = c(runif(20,min=0,max=25),runif(10,min=30,max=50),runif(20,min=15,max=40)),
var3 = var2^2) # note that this is just var2 squared
Plot by the second variable to view quadratic trend
ggplot(df,aes(x=var2,y=response)) +
geom_point() +
geom_smooth(method="loess")+
coord_cartesian(ylim = c(0,1))
test a few different model formulas
formulas <- list(response ~ var1 + var2, # both vars linear
response ~ var1 + var2 + I(var2^2), # add quad term for var2
response ~ var1 + I(var2^2), # only quad term for var2
response ~ var1 + var2 + var3, # add var3, which is var2^2
response ~ var1 + var3) # only var1 and var3
# build a df of some model selection criteria:
selection <- purrr::map_df(formulas, ~{
mod <- glm(.x, data= df, family="binomial")
data.frame(formula = format(.x),
AIC = round(AIC(mod),2),
BIC = round(BIC(mod),2),
R2adj = round(DescTools::PseudoR2(mod,which=c("McFaddenAdj")),4)
)
}) %>% arrange(desc(AIC))
view selection criteria:
> selection
formula AIC BIC R2adj
1 response ~ var1 + I(var2^2) 65.88 71.62 0.0211
2 response ~ var1 + var2 65.26 70.99 0.0304
3 response ~ var1 + var2 + var3 64.69 72.33 0.0389
4 response ~ var1 + var3 63.18 68.91 0.0613
5 response ~ var1 + var2 + I(var2^2) 45.09 52.74 0.3300
Basically I'm wondering- can someone explain to me why these are all different? What should I be using to use one term with a quadratic pattern? Why am I getting such different results?