Fit series of linear models to separate dependent variables for a different grouping variable

Question

Hi: I have six continuous dependent variables and one independent variable for three countires. I would like to see what the coefficient is for y1 to y6 ~ x1 for each country. Is there a way to do this neatly with dplyr and broom? I know dplyr fairly well but am new to broom.

#one random independent variable 
x1<-rnorm(100, mean=5, sd=1)
#one random dependent variable
y1<-rnorm(100, mean=2, sd=2)
#two random dependent variables, in reality I have six
y2<-rnorm(100, mean=3, sd=1)
#Grouping variable. 
country<-sample(seq(1,3,1), size=100, replace=T)
#data frame
df<-data.frame(x1, y1, y2, country)
#I would like to see what the coefficients are for y1~x1 
and then y2 ~x2 for   country 1, country 2, country 3, etc. 
library(dplyr)
#Fit one model for each of three countries
test<-df%>%
 group_by(country) %>%
  do(mod.y1=lm(y1~x1, data=.))
#print results
test$mod.y1

David Robinson · Accepted Answer · 2016-03-22T17:43:36.187

You can use a combination of gather from tidyr and tidy from broom. First, instead of doing one fit for each country, do one fit for each combination of y1/y2 and country:

library(tidyr)
library(broom)

fits <- df %>%
  gather(variable, value, y1, y2) %>%
  group_by(country, variable) %>%
  do(mod = lm(value ~ x1, .))

Then you can tidy them (with broom) and filter out the intercept term:

td <- tidy(fits, mod) %>%
  filter(term != "(Intecept)")

This gives you a data frame td that looks like:

Source: local data frame [6 x 7]
Groups: country, variable [6]

  country variable  term     estimate std.error   statistic   p.value
    (dbl)    (chr) (chr)        (dbl)     (dbl)       (dbl)     (dbl)
1       1       y1    x1  0.106140467 0.3835857  0.27670599 0.7835458
2       1       y2    x1 -0.004725751 0.1837192 -0.02572268 0.9796168
3       2       y1    x1 -0.193700062 0.4690913 -0.41292614 0.6826979
4       2       y2    x1  0.094083592 0.2024151  0.46480518 0.6455421
5       3       y1    x1 -0.223523980 0.3820297 -0.58509584 0.5631692
6       3       y2    x1 -0.029720338 0.2116219 -0.14044074 0.8893172

Your estimate column is the estimated coefficients.

OK, this is perfect, but I don't quite understand the syntax of the arguments to tidy. How does tidy(test, mod.y1) relate to other variable names in other models? — spindoctor, Mar 22 '16 at 17:43
@spindoctor There was a typo in my answer (now fixed), should be `tidy(fits, mod)`. The `mod` argument tells it that the models we want to tidy are in the `mod` column (as defined in `do(mod =` above). Try `help(rowwise_df_tidiers)` in R for more (or look [here](http://www.inside-r.org/node/355600)) — David Robinson, Mar 22 '16 at 17:44
Got it! The answer is that the argument y in tidy(x, y) is the name of the model that was fit in the previous do() command. — spindoctor, Mar 22 '16 at 17:44

Fit series of linear models to separate dependent variables for a different grouping variable

1 Answers1