1

Say I have this df:

df <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), q1 = c(1, 1, 4, 
5, 3, 3, 3, 2), q2 = c(5, 4, 4, 1, 1, 2, 3, 3), q3 = c(3, 3, 
2, 4, 3, 3, 2, 5), q4 = c(6, 5, 3, 3, 2, 1, 3, 4), q5 = c(2, 
1, 3, 4, 5, 4, 3, 2), v1 = c(0, 0, 1, 1, 1, 1, 0, 1), v2 = c("19-25", 
"19-25", "19-25", "26-34", "26-34", "35-44", "35-44", "35-44"
), v3 = c("abc", "def", "abc", "abc", "abc", "def", "def", "abc"
)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))

> df
# A tibble: 8 x 9
     id    q1    q2    q3    q4    q5    v1 v2    v3   
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1     1     1     5     3     6     2     0 19-25 abc  
2     2     1     4     3     5     1     0 19-25 def  
3     3     4     4     2     3     3     1 19-25 abc  
4     4     5     1     4     3     4     1 26-34 abc  
5     5     3     1     3     2     5     1 26-34 abc  
6     6     3     2     3     1     4     1 35-44 def  
7     7     3     3     2     3     3     0 35-44 def  
8     8     2     3     5     4     2     1 35-44 abc  

I want to run a series of models estimating the R^2 of regressing q* on all other columns (apart from id) and then replace the DV from the first model with another of the q* columns, etc. The output should be a tibble with the standard output from broom::glance.

For example, the first model would be:

glance(
    lm(as.numeric(q1) ~ 
           v1 +
           as.factor(v2) + 
           as.factor(v3) + 
           q2 +
           q3 +
           q4 +
           q5,
       data = df))

# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0           0     8

And the second model would swap q1 with q2, so q2 becomes the dependent variable and q1 one of the independent variables. I would repeat this for all q* columns, so I end up with a tibble of 5 rows. The v* columns are included in each model but never become dependent variables.

I would also want an indicator in the final tibble for which dependent variable that model was run on (i.e. a column called dv that contained q1 when q1 was the dependent variable, or q2 when that was the dependent variable, etc.).

Is this possible? I'd rather avoid copy and pasting the above n times.

C.Robin
  • 1,085
  • 1
  • 10
  • 23

1 Answers1

2

You could define

vars <- c("q1", "q2", "q3", "q4", "q5")

and iterate over it and create the formulae:

library(broom)
library(dplyr)
library(purrr)

vars %>% 
  map_chr(~ paste0("as.numeric(", .x, ") ~ v1 + as.factor(v2) + as.factor(v3) +",
                   paste(vars[vars != .x], collapse = "+"))) %>% 
  map(~ .x %>% 
        as.formula() %>% 
        lm(data = df) %>% 
        glance())

This results in a list of five data.frames:

[[1]]
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0           0     8

[[2]]
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0           0     8

[[3]]
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0           0     8

[[4]]
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0           0     8

[[5]]
# A tibble: 1 x 12
  r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
      <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1         1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0           0     8

Since they are all similar, you could create one data.frame:

vars <- c("q1", "q2", "q3", "q4", "q5")
names(vars) <- vars

vars %>% 
  map_chr(~ paste0("as.numeric(", .x, ") ~ v1 + as.factor(v2) + as.factor(v3) +",
                   paste(vars[vars != .x], collapse = "+"))) %>% 
  map_df(~ .x %>% 
        as.formula() %>% 
        lm(data = df) %>% 
        glance(),
        .id = "dependent_var")

returning

# A tibble: 5 x 13
  dependent_var r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance
  <chr>             <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>
1 q1                    1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0
2 q2                    1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0
3 q3                    1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0
4 q4                    1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0
5 q5                    1           NaN   NaN       NaN     NaN     7    Inf  -Inf  -Inf        0
# ... with 2 more variables: df.residual <int>, nobs <int>
Martin Gal
  • 16,640
  • 5
  • 21
  • 39