Create a loop in R to run all possible combinations of categorical variables(levels) in regression

Question

I am trying to run different multiple linear regression models in R with my sales dataset containing over 1000 products and 250 retailers. I am interested in looking at the model coefficients for each product and retailer combination. I tried using dummy variables for the categorical column but it didn't produce the individual estimates of the coefficients like I needed. How can one achieve this using a for loop that iterates over each possible combination?

score 2 · Answer 1 · answered Aug 31 '21 at 20:29

I like this tidyverse approach, where we nest the data within each group, i.e. "cut" and "color", standins for your "product" and "retailer" variables. Then we can run a linear regression within each group on all the other variables.

library(tidyverse); library(broom)
diamonds %>%
  nest(data = c(carat, clarity:z)) %>%
  mutate(fit = map(data, ~lm(price ~ ., data = .x)),
         tidied = map(fit, tidy)) %>%
  unnest(tidied)

Result

# A tibble: 486 x 9
   cut   color data             fit    term      estimate std.error statistic   p.value
   <ord> <ord> <list>           <list> <chr>        <dbl>     <dbl>     <dbl>     <dbl>
 1 Ideal E     <tibble [3,903 … <lm>   (Interce…   9902.     3099.       3.20 1.41e-  3
 2 Ideal E     <tibble [3,903 … <lm>   carat      16448.      263.      62.4  0        
 3 Ideal E     <tibble [3,903 … <lm>   clarity.L   4144.      131.      31.7  2.19e-196
 4 Ideal E     <tibble [3,903 … <lm>   clarity.Q  -1427.      128.     -11.2  1.44e- 28
 5 Ideal E     <tibble [3,903 … <lm>   clarity.C    962.      104.       9.22 4.72e- 20
 6 Ideal E     <tibble [3,903 … <lm>   clarity^4   -134.       75.2     -1.79 7.43e-  2
 7 Ideal E     <tibble [3,903 … <lm>   clarity^5     97.2      52.2      1.86 6.27e-  2
 8 Ideal E     <tibble [3,903 … <lm>   clarity^6     79.8      39.5      2.02 4.35e-  2
 9 Ideal E     <tibble [3,903 … <lm>   clarity^7    154.       33.6      4.58 4.84e-  6
10 Ideal E     <tibble [3,903 … <lm>   depth        -67.0      46.7     -1.43 1.52e-  1
# … with 476 more rows

Check out https://r4ds.had.co.nz/many-models.html#many-models or https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html. Other great tutorials online. — Jeff Parker, Aug 31 '21 at 20:32

Ben Bolker · Answer 2 · 2021-08-31T21:50:17.633

0

How about something like

library(lme4)
my_data <- transform(my_data, prcomb = interaction(product, retailer, drop = TRUE))
result <- lmList(sales ~ x1 + x2 + x3 | prcomb, data = my_data)

(including drop=TRUE because lmList doesn't like empty categories ...) ? (Hopefully you have fewer than the full 250,000 product/retailer combinations ...?)

For example:

mt <- transform(mtcars, cylam = interaction(cyl, am, drop=TRUE))
mm <- lme4::lmList(mpg ~ wt | cylam, data =mt)
coef(mm)
summary(mm)

(Note that the summary() method doesn't seem to be very graceful about categories with only one element in them ...)

edited Aug 31 '21 at 21:50

answered Aug 31 '21 at 20:29

Ben Bolker

211,554
25
370
453

1

250k, no? still a lot. – Jon Spring Aug 31 '21 at 20:31
Yeah. 250K regressions scares me **less** (edit) than 250 million. (In any case something with shrinkage would be a really good idea ...) – Ben Bolker Aug 31 '21 at 20:37
Hi Ben, How can you view the model summary from result? – asterisk21 Aug 31 '21 at 20:57

Create a loop in R to run all possible combinations of categorical variables(levels) in regression

2 Answers2