Let's imagine we want to model the US State Public-School Expenditures (education) using income, young, urban and region as regressors. For more info: ?Anscombe
Model: education ~ (income+young+urban)*region
library(car)
library(leaps)
#Loading Data
data(Anscombe)
data(state)
stateinfo <- data.frame(region=state.region,row.names=state.abb)
datamodel <- data.frame(merge(stateinfo,Anscombe,by="row.names"),row.names=1)
head(datamodel)
region education income young urban
AK West 372 4146 439.7 484
AL South 112 2337 362.2 584
AR South 134 2322 351.9 500
AZ West 207 3027 387.5 796
CA West 273 3968 348.4 909
CO West 192 3340 358.1 785
#Saturated Model
MOD1 <- lm(education~(.-region)*region,datamodel)
summary(MOD1)
#anova(MOD1)
#Look for the "best" model
MOD1.subset <- regsubsets(education~(.-region)*region,datamodel,nvmax=15)
plot(MOD1.subset)
The model with 3 variables and 1 interaction (education~income+young+urban+RegionWest:young) seems to be the best in terms of BIC.
coef(MOD1.subset,4)
The questions is, how do I obtain the ML object from that model without writing the formula by hand?
Before posting, I found the package HH which has some interesting functions for regsubsets objects such as summaryHH
and lm.regsubsets
.
library(HH)
summaryHH(MOD1.subset)[4,]
lm.regsubsets(MOD1.subset,4)
lm.regsubsets
does what I want but I think that has some problems parsing interactions, any alternatives?