0

lets say I have following data

ind1 <- rnorm(99)
ind2 <- rnorm(99)
ind3 <- rnorm(99)
ind4 <- rnorm(99)
ind5 <- rnorm(99)
dep <- rnorm(99, mean=ind1)
group <- rep(c("A", "B", "C"), each=33)
df <- data.frame(dep,group, ind1, ind2, ind3, ind4, ind5)

the following code is calculating multiple linear regression between dependend variable and 2 independent variables by group which is exactly what I want to do. But I want to regress dep variable against all combination pair of independent variables at once. So how can I combine other models in this code?

df %>% 
  nest(-group) %>% 
  mutate(fit = map(data, ~ lm(dep ~ ind1 + ind2, data = .)),
         results1 = map(fit, glance),
         results2 = map(fit, tidy)) %>% 
  unnest(results1) %>% 
  unnest(results2) %>% 
  select(group, term, estimate, r.squared, p.value, AIC) %>% 
  mutate(estimate = exp(estimate)) 

Thanks in advance!

R starter
  • 197
  • 12
  • Note more than `dplyr` is used in attempted code: `dplyr != tinyverse` but `dplyr %in% tinyverse`. – Parfait May 11 '19 at 18:52

1 Answers1

2

Not a full tidy answer. Consider building all possible combinations of linear formulas with rapply after initial build with lapply and combn then pass into your tidy method:

indvar_list <- lapply(1:5, function(x) 
                 combn(paste0("ind", 1:5), x, , simplify = FALSE))

formulas_list <- rapply(indvar_list, function(x)
                   as.formula(paste("dep ~", paste(x, collapse="+"))))

run_model <- function(f) {    
    df %>% 
      nest(-group) %>% 
      mutate(fit = map(data, ~ lm(f, data = .)),
             results1 = map(fit, glance),
             results2 = map(fit, tidy)) %>% 
      unnest(results1) %>% 
      unnest(results2) %>% 
      select(group, term, estimate, r.squared, p.value, AIC) %>% 
      mutate(estimate = exp(estimate))
}

tibble_list <- lapply(formulas_list, run_model)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • @ Parfait. Thank you! what if my independent variables have different names like height, weight, volume, mass etc (not all started with ind) to create indvar_list? – R starter May 11 '19 at 20:05
  • Then replace input argument of `combn` with a vector of variable names: `combn(c("type", "height", "weight", "volume", "mass"), x, simplify=FALSE)`. Be sure the `lapply` runs in sequence from 1 to total number of independent variables. Also, see edit with syntax corrections regarding `rapply`. – Parfait May 11 '19 at 22:37
  • @ Parfait. Thank you very much!. It helped me a lot. – R starter May 12 '19 at 10:55
  • I am sorry. I run the code. But I am getting this for all combination "[[31]] function (f) { df %>% nest(-group) %>% mutate(fit = map(data, ~lm(f, data = .)), results1 = map(fit, glance), results2 = map(fit, tidy)) %>% unnest(results1) %>% unnest(results2) %>% select(group, term, estimate, r.squared, p.value, AIC) %>% mutate(estimate = exp(estimate)). }". – R starter May 12 '19 at 18:17
  • 1
    Hmmmm...carefully check implementation. You may have changed something from this solution as your method appears to return the function's definition and not function's evaluation. See output I get for the 31st item of *tibble_list*: https://pastebin.com/70HAqhas. – Parfait May 12 '19 at 20:43