0

I would like to programmatically build glms in r, similarly to what's described here (How to build and test multiple models in R), except testing all possible subsets of predictor variables instead.

So, for a dataset like this, with outcome variable z:

data <- data.frame("z" = rnorm(20, 15, 3),
                   "a" = rnorm(20, 20, 3),
                   "b" = rnorm(20, 25, 3),
                   "c" = rnorm(20, 5, 1))

is there a way to automate building the models:

m1 <- glm(z ~ a, data = data)
m2 <- glm(z ~ b, data = data)
m3 <- glm(z ~ c, data = data)
m4 <- glm(z ~ a + b, data = data)
m5 <- glm(z ~ a + c, data = data)
m6 <- glm(z ~ b + c, data = data)
m7 <- glm(Z ~ a + b + c, data = data)

I know the dredge function of the MuMIn package can do this, but I got an error saying that I was including too many variables, so I'm looking for ways to do this independently of dredge. I've tried grid.expand() and combn(), map() and lapply() variants of answers I've found on StackOverflow and can't seem to piece this together. Ideally, model output, including BIC, would be stored in a sortable dataframe.

Any help would be greatly appreciated!!

Community
  • 1
  • 1
  • 2
    *"testing all possible subsets of predictor variables"* From a statistics point of view, that's a very dangerous game to play. I would advise against doing that; instead I'd recommend looking into feature selection methods. – Maurits Evers Jan 16 '20 at 23:14
  • I certainly agree! The code I am trying to replicate in R was originally written by others for Eviews as described in this article (https://psycnet.apa.org/record/2016-43095-009). Thanks for the tip about feature selection! I will certainly take this into account as I'm trying to understand what they did. – williakrt Jan 16 '20 at 23:27
  • If that's really what they did (can't access the paper as it's paywalled) I'd stay as far away from it as possible. I've got no idea what this SDA does but generally (and in response to your question) a good approach would be to model everything (e.g. in a fully Bayesian model) and use regularising priors on the parameters. – Maurits Evers Jan 17 '20 at 00:05

1 Answers1

1

Assuming you have taken note of @Maurits Evers' comment, you can achieve what you want to do by combination of lapply and combn

cols <- names(data)[-1]
lapply(seq_along(cols), function(x) combn(cols, x, function(y) 
       glm(reformulate(y, "z"), data = data), simplify = FALSE))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213