How to build and test multiple models in R

Question

Suppose I have a dataset like (forget about the distribution):

modData <- data.frame("A" = rnorm(20, 15, 3),
                      "B" = rnorm(20, 20, 3),
                      "C" = rnorm(20, 25, 3),
                      "X" = rnorm(20, 5, 1)
                      )

If I use X as a predictor, A, Band C as responses, respectively:

md1 <- lm(A ~ X, data = modData)
md2 <- lm(B ~ X, data = modData)
md3 <- lm(C ~ X, data = modData)

Then do a Shapiro test and a boxcox test to every model, e.g. :

shapiro.test(residuals(md1))
boxcox(md1, plotit = T)

Is there a convenient way to build and test multiple models without manually typing each of them?

Thank you for providing a reproducible example and clear statement of what you're after. Novice visitors could learn a lot from questions like this one. — Roman Luštrik, May 19 '18 at 11:24

score 5 · Accepted Answer · answered May 19 '18 at 09:51

Here's an alternative approach using tidyverse:

modData <- data.frame("A" = rnorm(20, 15, 3),
                      "B" = rnorm(20, 20, 3),
                      "C" = rnorm(20, 25, 3),
                      "X" = rnorm(20, 5, 1))
library(tidyverse)
library(broom)

# specify predictor and target variables
x = "X"
y = names(modData)[names(modData)!= x]

expand.grid(y,x) %>%                                    # create combinations
  mutate(model_id = row_number(),                       # create model id
         frml = paste0(Var1, "~", Var2)) %>%            # create model formula
  group_by(model_id, Var1, Var2) %>%                    # group by the above
  nest() %>%                                            # nest data
  mutate(m = map(data, ~lm(.$frml, data = modData)),    # create models
         m_table = map(m, ~tidy(.)),                    # tidy model output
         st = map(m, ~shapiro.test(residuals(.)))) -> dt_model_info  # shapiro test

# access model info
dt_model_info
dt_model_info$m
dt_model_info$m_table
dt_model_info$st

# another way to access info
dt_model_info %>% unnest(m_table)

Andrii · Answer 2 · 2018-05-19T09:25:06.877

2

Here is an approach by using simple "lapply":

# 1. Data set
df <- data.frame(
  a = rnorm(20, 15, 3),
  b = rnorm(20, 20, 3),
  c = rnorm(20, 25, 3),
  x = rnorm(20, 5, 1))

# 2. Models
fit_lm_a <- lm(a ~ x, df)
fit_lm_b <- lm(b ~ x, df)
fit_lm_c <- lm(c ~ x, df)

# 3. List of models
list_fit_lm <- list(fit_lm_a, fit_lm_b,fit_lm_c)

# 3. Shapiro test
lapply(
  list_fit_lm, function(x) {
    shapiro.test(residuals(x)) 
  })

 # 4. Box-Cox Transformations
 lapply(
  list_fit_lm, function(x) {
    boxcox(x, plotit = TRUE, data = df)
  }
 )

edited May 19 '18 at 09:25

answered May 19 '18 at 09:21

Andrii

2,843
27
33

1

Thanks but what about the building procedure? My imagined pseudo code: `listOfModels <- lm(a : c ~ x, df)` – Tianjian Qin May 19 '18 at 09:24
Thanks! I did the same :) – Andrii May 19 '18 at 09:32

score 2 · Answer 3 · answered May 19 '18 at 11:23

In case you wouldn't want to introduce dozens of dependencies, you can do it with a simple sapply. Notice that I don't provide a boxcox part because I don't know where it comes from (car, MASS?).

modData <- data.frame("A" = rnorm(20, 15, 3),
                      "B" = rnorm(20, 20, 3),
                      "C" = rnorm(20, 25, 3),
                      "X" = rnorm(20, 5, 1))

deps <- c("A", "B", "C")
indeps <- c("X")

result <- sapply(deps, FUN = function(x, indeps, mydata) {
  myformula <- formula(sprintf("%s ~ %s", x, indeps))

  model <- lm(myformula, data = mydata)
  out.shapiro <- shapiro.test(residuals(model))

  return(list(model = model, shapiro = out.shapiro))
}, indeps = indeps, mydata = modData, simplify = FALSE)

score 1 · Answer 4 · answered Jan 26 '21 at 10:02

1

data("mtcars")

formulas <- list(
  mpg ~ disp,
  mpg ~ disp + wt
)

res <- vector("list", length = length(formulas))

for(i in seq_along(formulas)){
  res[[i]] <- lm(formulas[[i]], data = mtcars)}
res

lapply(formulas, lm, data = mtcars)

answered Jan 26 '21 at 10:02

Seyma Kalay

2,037
10
22

While this code may provide a solution to the question, it's better to add context as to why/how it works. This can help future users learn, and apply that knowledge to their own code. You are also likely to have positive feedback from users in the form of upvotes, when the code is explained. – Amit Verma Jan 26 '21 at 12:17
You should provide an explanation and a rationale with an answer, otherwise you risk that this answer would be deleted. – Bas Jansen Jan 26 '21 at 15:09

How to build and test multiple models in R

4 Answers4

Linked