I am trying to run the bestglm
function in R for subset selection and the run fails immediately if I use more than 15 variables in the function. I attached some sample code below (I know these models have far too many variables for this dataset, I am just including these models here as an example):
cars.df = data.frame(mtcars)
cars.df
resp.var = cars.df$mpg
ind.matrix.15 = model.matrix(mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb + disp:wt + drat:wt + qsec:am + gear:hp + cyl:disp + drat:gear, data = cars.df)[, -1]
matrix.xy.15 = data.frame(ind.matrix.15, y = as.matrix(resp.var))
bestglm(Xy = matrix.xy.15, family = gaussian(link = 'log'), nvmax = 15)
ind.matrix.16 = model.matrix(mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb + disp:wt + drat:wt + qsec:am + gear:hp + cyl:disp + drat:gear + disp:hp, data = cars.df)[, -1]
matrix.xy.16 = data.frame(ind.matrix.16, y = as.matrix(resp.var))
bestglm(Xy = matrix.xy.16, family = gaussian(link = 'log'), nvmax = 16)
The first bestglm
function runs fine, but when I add an additional variable for a total of 16 features, the second bestglm
function instantly produces this error message: p = 16. must be <= 15 for GLM.
Changing the method
argument to a simpler algorithm such as backward
rather than the default exhaustive
does not make the error go away.
Is this just a limitation of the bestglm
function, or is there an argument I can change to allow more than 15 features.