1

I am working on a custom function whose goal is to run a function (..f) for all combinations of grouping variables grouping.var provides for a given dataframe and then tidy those results into a dataframe using broom package.

Here is a custom function I've written. Note that ... are supplied to ..f, while additional arguments for broom::tidy method are supplied via tidy.args list.

# setup
set.seed(123)
library(tidyverse)
options(pillar.sigfig = 8)

# custom function
grouped_tidy <- function(data,
                         grouping.vars,
                         ..f,
                         ...,
                         tidy.args = list()) {
  # check how many variables were entered for grouping variable vector
  grouping.vars <-
    as.list(rlang::quo_squash(rlang::enquo(grouping.vars)))
  grouping.vars <-
    if (length(grouping.vars) == 1) {
      grouping.vars
    } else {
      grouping.vars[-1]
    }

  # quote all argument to `..f`
  dots <- rlang::enquos(...)

  # running the grouped analysis
  df_results <- data %>%
    dplyr::group_by(.data = ., !!!grouping.vars, .drop = TRUE) %>%
    dplyr::group_map(
      .tbl = ., 
      .f = ~ broom::tidy(
        x = rlang::exec(.fn = ..f, !!!dots, data = .x),
        unlist(tidy.args)
    ))

  # return the final dataframe with results
  return(df_results)
}

As shown by examples below, although this function works, I am doubtful the tidy.args list is getting evaluated properly because irrespective of what conf.level I choose, I always get the same results to the 4th decimal place.

  • 95% CI
# using the function to get 95% CI
grouped_tidy(
  data = ggplot2::diamonds,
  grouping.vars = c(cut),
  ..f = stats::lm,
  formula = price ~ carat - 1,
  tidy.args = list(conf.int = TRUE, conf.level = 0.95)
)

#> # A tibble: 5 x 8
#> # Groups:   cut [5]
#>   cut       term   estimate std.error statistic p.value  conf.low conf.high
#>   <ord>     <chr>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
#> 1 Fair      carat 4510.7919 42.614474 105.85117       0 4427.2062 4594.3776
#> 2 Good      carat 5260.8494 27.036670 194.58200       0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334       0 5635.8976 5709.1132
#> 4 Premium   carat 5807.1392 16.836474 344.91422       0 5774.1374 5840.1410
#> 5 Ideal     carat 5819.4837 15.178657 383.39911       0 5789.7324 5849.2350
  • 99% CI
# using the function to get 99% CI
grouped_tidy(
  data = ggplot2::diamonds,
  grouping.vars = c(cut),
  ..f = stats::lm,
  formula = price ~ carat - 1,
  tidy.args = list(conf.int = TRUE, conf.level = 0.99)
)

#> # A tibble: 5 x 8
#> # Groups:   cut [5]
#>   cut       term   estimate std.error statistic p.value  conf.low conf.high
#>   <ord>     <chr>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
#> 1 Fair      carat 4510.7919 42.614474 105.85117       0 4427.2062 4594.3776
#> 2 Good      carat 5260.8494 27.036670 194.58200       0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334       0 5635.8976 5709.1132
#> 4 Premium   carat 5807.1392 16.836474 344.91422       0 5774.1374 5840.1410
#> 5 Ideal     carat 5819.4837 15.178657 383.39911       0 5789.7324 5849.2350

Any idea on how I can change the function so that the list of arguments will be evaluated properly by broom::tidy?

Ben G
  • 4,148
  • 2
  • 22
  • 42
Indrajeet Patil
  • 4,673
  • 2
  • 20
  • 51
  • what is the purpose of `quo_squash`. Just starting to get my head wrapped around tidyeval, but how does that work with `enquo`? – Ben G Feb 22 '19 at 18:01
  • As the documentation for this function notes, it's for flattening all nested quosures within an expression. This is desirable here because of the different ways the users are expected to supply arguments to `grouping.var`. See this gist: https://gist.github.com/IndrajeetPatil/c53748c25224c12172f0b610d122b506 – Indrajeet Patil Feb 22 '19 at 18:37

1 Answers1

2
set.seed(123)
library(tidyverse)
options(pillar.sigfig = 8)

grouped_tidy <- function(data,
                         grouping.vars,
                         ..f,
                         ...,
                         tidy.args = list()) {

  # functions passed to group_map must accept
  # .x and .y arguments, where .x is the data

  tidy_group <- function(.x, .y) {

    # presumes ..f won't explode if called with these args
    model <- ..f(..., data = .x)

    # mild variation on do.call to call function with
    # list of arguments
    rlang::exec(broom::tidy, model, !!!tidy.args)
  }

  data %>%
    group_by(!!!grouping.vars, .drop = TRUE) %>%
    group_map(tidy_group) %>% 
    ungroup()  # don't get bitten by groups downstream
}

grouped_tidy(
  data = ggplot2::diamonds,

  # wrap grouping columns in vars() like in scoped dplyr verbs
  grouping.vars = vars(cut),  

  ..f = stats::lm,
  formula = price ~ carat - 1,
  tidy.args = list(conf.int = TRUE, conf.level = 0.95)
)
#> # A tibble: 5 x 8
#>   cut       term   estimate std.error statistic p.value  conf.low conf.high
#>   <ord>     <chr>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
#> 1 Fair      carat 4510.7919 42.614474 105.85117       0 4427.2062 4594.3776
#> 2 Good      carat 5260.8494 27.036670 194.58200       0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334       0 5635.8976 5709.1132
#> 4 Premium   carat 5807.1392 16.836474 344.91422       0 5774.1374 5840.1410
#> 5 Ideal     carat 5819.4837 15.178657 383.39911       0 5789.7324 5849.2350

Created on 2019-02-23 by the reprex package (v0.2.1)

alexpghayes
  • 673
  • 5
  • 17