1

I have a data frame that looks a bit like this:

example <- tribble(
  ~PERSON_ID, ~RANGE, ~outcome1, ~outcome2, ~outcome3, ~type,
  "1", "PRE", 1, 2, 3, "type1",
  "1", "POST", 2, 3, 4, "type1",
  "2", "PRE", 3, 4, 5, "type2",
  "2", "POST", 5, 6, 7, "type2",
  "3", "PRE", 6, 7, 8, "type3",
  "3", "POST", 9, 10, 11, "type3",
  "4", "PRE", 12, 12, 13, "type1",
  "4", "POST", 12, 13, 14, "type1",
  "5", "PRE", 13, 14, 15, "type2",
  "5", "POST", 15, 16, 17, "type2",
  "6", "PRE", 16, 17, 18, "type3",
  "6", "POST", 19, 10, 11, "type3"
)

I would like to run the following function over it, such that I get a linear model for each "type" and within each type, a result for each outcome ("outcome1", "outcome2", "outcome3")

model_function <- function(outcome, df) {
  fit <- lmer(as.formula(paste0(outcome, "~ RANGE + (1|PERSON_ID)")), data = df)
  return(summary(fit))
}

This is what I have tried, but get the following error: "Error: The result of .f should be a data frame."

outcomes <- purrr::set_names(names(example)[c(3,4,5)])

example %>% 
  dplyr::group_by(type) %>% 
  group_modify(~model_function(outcome = outcomes, df = .x))

Any help much appreciated... Ideally the end result would be a table that looks like this:

outcome1 outcome2 outcome3
type1 coefficient; pvalue coefficient; pvalue coefficient; pvalue
type2 coefficient; pvalue coefficient; pvalue coefficient; pvalue
type3 coefficient; pvalue coefficient; pvalue coefficient; pvalue
Dr Wampa
  • 443
  • 3
  • 8
  • 1
    There is a lot going on here. It looks like you'll need some sort of inner loop to loop through the 3 columns for each type group within `group_modify()`. Also `group_modify()` expects a data.frame as output so your function should output a data.frame (there are some examples in the documentation "Examples"). Question: do you literally want output where the cell of a data.frame has the coefficient and the p-value in it separated by a semi-colon? Or is this short-hand for some sort of nested data.frame? Also, are you planning on getting p-values via **lmerTest**? – aosmith Aug 04 '21 at 21:29

2 Answers2

2

Assuming you are using library(lmerTest) to get your P.values, here is a potential solution utilizing library(broom.mixed):

library(tidyverse)
library(lmerTest)
library(broom.mixed)

example <- tribble(
  ~PERSON_ID, ~RANGE, ~outcome1, ~outcome2, ~outcome3, ~type,
  "1", "PRE", 1, 2, 3, "type1",
  "1", "POST", 2, 3, 4, "type1",
  "2", "PRE", 3, 4, 5, "type2",
  "2", "POST", 5, 6, 7, "type2",
  "3", "PRE", 6, 7, 8, "type3",
  "3", "POST", 9, 10, 11, "type3",
  "4", "PRE", 12, 12, 13, "type1",
  "4", "POST", 12, 13, 14, "type1",
  "5", "PRE", 13, 14, 15, "type2",
  "5", "POST", 15, 16, 17, "type2",
  "6", "PRE", 16, 17, 18, "type3",
  "6", "POST", 19, 10, 11, "type3"
)

example |> 
  nest(data = -type) |> 
  mutate(
    m1 = map(data, ~ lmer(outcome1 ~ RANGE + (1|PERSON_ID), data = .x)),
    m2 = map(data, ~ lmer(outcome2 ~ RANGE + (1|PERSON_ID), data = .x)),
    m3 = map(data, ~ lmer(outcome3 ~ RANGE + (1|PERSON_ID), data = .x)),
    across(m1:m3, map, ~ .x |> tidy() |> select(estimate, p.value) |> slice(2))
  ) |> 
  select(-data) |> 
  unnest(everything(), names_sep = "_")

#> # A tibble: 3 × 7
#>   type_type m1_estimate m1_p.value m2_estimate m2_p.value m3_estimate m3_p.value
#>   <chr>           <dbl>      <dbl>       <dbl>      <dbl>       <dbl>      <dbl>
#> 1 type1            -0.5      0.500       -1      3.68e-10       -1      8.83e- 1
#> 2 type2            -2        0.888       -2      9.01e- 1       -2      2.70e-10
#> 3 type3            -3        0.887        2.00   7.28e- 1        2.00   7.28e- 1

Created on 2021-08-04 by the reprex package (v2.0.0)

tomasu
  • 1,388
  • 9
  • 11
1

Here's an option using an alternative to your model_function function, where I do an inner map_dfc() loop within group_modify() to loop through all outcome variables for each group.

Note your function must return a data.frame for group_modify() to work properly. I modified it to return a data.frame containing just the coefficient and p-value from the test for differences in the fixed effects. I include example output for one model.

Note I used lmerTest for p-values and broom.mixed for convenient output. However, you can pull coefficients and p-values manually out of the summary output, as well.

library(lme4)
library(lmerTest)
library(broom.mixed)
library(purrr)
library(dplyr)
example <- tibble::tribble(
    ~PERSON_ID, ~RANGE, ~outcome1, ~outcome2, ~outcome3, ~type,
    "1", "PRE", 1, 2, 3, "type1",
    "1", "POST", 2, 3, 4, "type1",
    "2", "PRE", 3, 4, 5, "type2",
    "2", "POST", 5, 6, 7, "type2",
    "3", "PRE", 6, 7, 8, "type3",
    "3", "POST", 9, 10, 11, "type3",
    "4", "PRE", 12, 12, 13, "type1",
    "4", "POST", 12, 13, 14, "type1",
    "5", "PRE", 13, 14, 15, "type2",
    "5", "POST", 15, 16, 17, "type2",
    "6", "PRE", 16, 17, 18, "type3",
    "6", "POST", 19, 10, 11, "type3"
)

model_function <- function(outcome, df) {
    fit <- lmer(as.formula(paste0(outcome, "~ RANGE + (1|PERSON_ID)")), data = df)
    out = tidy(fit, effects = "fixed")[2, c("estimate", "p.value")]
    names(out) = paste(outcome, names(out), sep = "_")
    out
}

# Check what output looks like
model_function("outcome1", df = filter(example, type == "type1"))
#> # A tibble: 1 x 2
#>   outcome1_estimate outcome1_p.value
#>               <dbl>            <dbl>
#> 1              -0.5            0.500

# variables to loop through
outcomes <- purrr::set_names(names(example)[c(3,4,5)])

example %>% 
    dplyr::group_by(type) %>% 
    group_modify(~ {
        map_dfc(outcomes, function(outcome) model_function(outcome = outcome, df = .x))
    })
#> # A tibble: 3 x 7
#> # Groups:   type [3]
#>   type  outcome1_estimate outcome1_p.value outcome2_estimate outcome2_p.value
#>   <chr>             <dbl>            <dbl>             <dbl>            <dbl>
#> 1 type1              -0.5            0.500             -1      0.000000000368
#> 2 type2              -2.             0.888             -2      0.901         
#> 3 type3              -3.             0.887              2.00   0.728         
#> # ... with 2 more variables: outcome3_estimate <dbl>, outcome3_p.value <dbl>

Created on 2021-08-05 by the reprex package (v2.0.0)

aosmith
  • 34,856
  • 9
  • 84
  • 118