I am trying to run different multiple linear regression models in R with my sales dataset containing over 1000 products and 250 retailers. I am interested in looking at the model coefficients for each product and retailer combination. I tried using dummy variables for the categorical column but it didn't produce the individual estimates of the coefficients like I needed. How can one achieve this using a for loop that iterates over each possible combination?
Asked
Active
Viewed 281 times
2 Answers
2
I like this tidyverse approach, where we nest the data within each group, i.e. "cut" and "color", standins for your "product" and "retailer" variables. Then we can run a linear regression within each group on all the other variables.
library(tidyverse); library(broom)
diamonds %>%
nest(data = c(carat, clarity:z)) %>%
mutate(fit = map(data, ~lm(price ~ ., data = .x)),
tidied = map(fit, tidy)) %>%
unnest(tidied)
Result
# A tibble: 486 x 9
cut color data fit term estimate std.error statistic p.value
<ord> <ord> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 Ideal E <tibble [3,903 … <lm> (Interce… 9902. 3099. 3.20 1.41e- 3
2 Ideal E <tibble [3,903 … <lm> carat 16448. 263. 62.4 0
3 Ideal E <tibble [3,903 … <lm> clarity.L 4144. 131. 31.7 2.19e-196
4 Ideal E <tibble [3,903 … <lm> clarity.Q -1427. 128. -11.2 1.44e- 28
5 Ideal E <tibble [3,903 … <lm> clarity.C 962. 104. 9.22 4.72e- 20
6 Ideal E <tibble [3,903 … <lm> clarity^4 -134. 75.2 -1.79 7.43e- 2
7 Ideal E <tibble [3,903 … <lm> clarity^5 97.2 52.2 1.86 6.27e- 2
8 Ideal E <tibble [3,903 … <lm> clarity^6 79.8 39.5 2.02 4.35e- 2
9 Ideal E <tibble [3,903 … <lm> clarity^7 154. 33.6 4.58 4.84e- 6
10 Ideal E <tibble [3,903 … <lm> depth -67.0 46.7 -1.43 1.52e- 1
# … with 476 more rows

Jon Spring
- 55,165
- 4
- 35
- 53
-
Check out https://r4ds.had.co.nz/many-models.html#many-models or https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html. Other great tutorials online. – Jeff Parker Aug 31 '21 at 20:32
0
How about something like
library(lme4)
my_data <- transform(my_data, prcomb = interaction(product, retailer, drop = TRUE))
result <- lmList(sales ~ x1 + x2 + x3 | prcomb, data = my_data)
(including drop=TRUE
because lmList
doesn't like empty categories ...)
? (Hopefully you have fewer than the full 250,000 product/retailer combinations ...?)
For example:
mt <- transform(mtcars, cylam = interaction(cyl, am, drop=TRUE))
mm <- lme4::lmList(mpg ~ wt | cylam, data =mt)
coef(mm)
summary(mm)
(Note that the summary()
method doesn't seem to be very graceful about categories with only one element in them ...)

Ben Bolker
- 211,554
- 25
- 370
- 453
-
1
-
Yeah. 250K regressions scares me **less** (edit) than 250 million. (In any case something with shrinkage would be a really good idea ...) – Ben Bolker Aug 31 '21 at 20:37
-