There's a few basic steps involved to get what you want:
- define the model grid of all potential predictor combinations
- model run all potential combinations of predictors
- use a criteria (or a set of multiple criteria) to select the best subset of predictors
The model grid can be defined with the following function:
# define model grid for best subset regression
# defines which predictors are on/off; all combinations presented
model.grid <- function(n){
n.list <- rep(list(0:1), n)
expand.grid(n.list)
}
For example with 4 variables, we get n^2 or 16 combinations. A value of 1 indicates the model predictor is on and a value of zero indicates the predictor is off:
model.grid(4)
Var1 Var2 Var3 Var4
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
I provide another function below that will run all model combinations. It will also create a sorted dataframe table that ranks the different model fits using 5 criteria. The predictor combo at the top of the table is the "best" subset given the training data and the predictors supplied:
# function for best subset regression
# ranks predictor combos using 5 selection criteria
best.subset <- function(y, x.vars, data){
# y character string and name of dependent variable
# xvars character vector with names of predictors
# data training data with y and xvar observations
require(dplyr)
reguire(purrr)
require(magrittr)
require(forecast)
length(x.vars) %>%
model.grid %>%
apply(1, function(x) which(x > 0, arr.ind = TRUE)) %>%
map(function(x) x.vars[x]) %>%
.[2:dim(model.grid(length(x.vars)))[1]] %>%
map(function(x) tslm(paste0(y, " ~ ", paste(x, collapse = "+")), data = data)) %>%
map(function(x) CV(x)) %>%
do.call(rbind, .) %>%
cbind(model.grid(length(x.vars))[-1, ], .) %>%
arrange(., AICc)
}
You'll see the tslm() function is specified...others could be used such as vglm(), etc. Simply swap in the model function you want.
The function requires 4 installed packages. The function simply configures data and uses the map() function to iterate across all model combinations (e.g. no for loop). The forecast package then supplies the cross-validation function CV(), which has the 5 metrics or selection criteria to rank the predictor subsets
Here is an application example lifted from the book "Forecasting Principles and Practice." The example also uses data from the book, which is found in the fpp2 package.
library(fpp2)
# test the function
y <- "Consumption"
x.vars <- c("Income", "Production", "Unemployment", "Savings")
best.subset(y, x.vars, uschange)
The resulting table, which is sorted on the AICc metric, is shown below. The best subset minimizes the value of the metrics (CV, AIC, AICc, and BIC), maximizes adjusted R-squared and is found at the top of the list:
Var1 Var2 Var3 Var4 CV AIC AICc BIC AdjR2
1 1 1 1 1 0.1163 -409.3 -408.8 -389.9 0.74859
2 1 0 1 1 0.1160 -408.1 -407.8 -391.9 0.74564
3 1 1 0 1 0.1179 -407.5 -407.1 -391.3 0.74478
4 1 0 0 1 0.1287 -388.7 -388.5 -375.8 0.71640
5 1 1 1 0 0.2777 -243.2 -242.8 -227.0 0.38554
6 1 0 1 0 0.2831 -237.9 -237.7 -225.0 0.36477
7 1 1 0 0 0.2886 -236.1 -235.9 -223.2 0.35862
8 0 1 1 1 0.2927 -234.4 -234.0 -218.2 0.35597
9 0 1 0 1 0.3002 -228.9 -228.7 -216.0 0.33350
10 0 1 1 0 0.3028 -226.3 -226.1 -213.4 0.32401
11 0 0 1 1 0.3058 -224.6 -224.4 -211.7 0.31775
12 0 1 0 0 0.3137 -219.6 -219.5 -209.9 0.29576
13 0 0 1 0 0.3138 -217.7 -217.5 -208.0 0.28838
14 1 0 0 0 0.3722 -185.4 -185.3 -175.7 0.15448
15 0 0 0 1 0.4138 -164.1 -164.0 -154.4 0.05246
Only 15 predictor combinations are profiled in the output since the model combination with all predictors off has been dropped. Looking at the table, the best subset is the one with all predictors on. However, the second row uses only 3 of 4 variables and the performance results are roughly the same. Also note that after row 4, the model results begin to degrade. Thats because income and savings appear to be the key drivers of consumption. As these two variables are dropped from the predictors, model performance drops significantly.
The performance of the custom function is solid since the results presented here match those of the book referenced.
A good day to you.