Here's a workflow based on the Many Models article, using broom
functions on nested data frames. Note that I increased the size of the data frame, just to have more observations to work with.
With this method, you group data by ID, nest it, and create the lm
objects by mapping over the nested data. broom::glance
gives you summary output on the models:
library(tidyverse)
library(broom)
set.seed(124)
df <- data.frame(id=sample(LETTERS[1:10], size = 10, replace = T),
d1=rnorm(100), d2=rnorm(100), d3=rnorm(100), t= rnorm(100), stringsAsFactors = F)
models_df <- df %>%
group_by(id) %>%
nest() %>%
mutate(mod = map(data, function(x) lm(t ~ d1 + d2 + d3, data = x))) %>%
mutate(glance = map(mod, glance)) %>%
unnest(glance)
models_df
#> # A tibble: 6 x 14
#> id data mod r.squared adj.r.squared sigma statistic p.value df
#> <chr> <list> <lis> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 A <tibb… <S3:… 0.495 0.243 0.955 1.96 0.221 4
#> 2 E <tibb… <S3:… 0.142 -0.0189 0.907 0.883 0.471 4
#> 3 F <tibb… <S3:… 0.179 0.0247 1.05 1.16 0.355 4
#> 4 D <tibb… <S3:… 0.818 0.727 0.522 8.97 0.0123 4
#> 5 C <tibb… <S3:… 0.0514 -0.0580 0.994 0.470 0.706 4
#> 6 J <tibb… <S3:… 0.202 -0.197 0.933 0.506 0.693 4
#> # ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
#> # deviance <dbl>, df.residual <int>
Now that you have a data frame with summary statistics like R-squared, you can use regular dplyr
commands like top_n
to get rankings:
# smallest 5 p-value
models_df %>% top_n(-5, p.value)
#> # A tibble: 5 x 14
#> id data mod r.squared adj.r.squared sigma statistic p.value df
#> <chr> <list> <lis> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 A <tibb… <S3:… 0.495 0.243 0.955 1.96 0.221 4
#> 2 E <tibb… <S3:… 0.142 -0.0189 0.907 0.883 0.471 4
#> 3 F <tibb… <S3:… 0.179 0.0247 1.05 1.16 0.355 4
#> 4 D <tibb… <S3:… 0.818 0.727 0.522 8.97 0.0123 4
#> 5 J <tibb… <S3:… 0.202 -0.197 0.933 0.506 0.693 4
#> # ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
#> # deviance <dbl>, df.residual <int>
# top r-sq
models_df %>% top_n(1, r.squared)
#> # A tibble: 1 x 14
#> id data mod r.squared adj.r.squared sigma statistic p.value df
#> <chr> <list> <lis> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 D <tibb… <S3:… 0.818 0.727 0.522 8.97 0.0123 4
#> # ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
#> # deviance <dbl>, df.residual <int>
For significant d3
terms, I went back to the models data frame and mapped broom::tidy
across the models, which gives summary stats on each term, then filtered to study the d3
terms.
# significant d3--assuming alpha = 0.05
models_df %>%
mutate(tidied = map(mod, tidy)) %>%
unnest(tidied, .drop = T) %>%
filter(term == "d3") %>%
filter(p.value < 0.05)
#> # A tibble: 1 x 17
#> id r.squared adj.r.squared sigma statistic p.value df logLik AIC
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 D 0.818 0.727 0.522 8.97 0.0123 4 -5.13 20.3
#> # ... with 8 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>,
#> # term <chr>, estimate <dbl>, std.error <dbl>, statistic1 <dbl>,
#> # p.value1 <dbl>
Created on 2018-06-20 by the reprex package (v0.2.0).