4

Lets say I'd like to calculate the mean, min and max for an arbitraty amount of groups within a custom function.

The toy data looks like this:

library(tidyverse)
df <- tibble(
  Gender = c("m", "f", "f", "m", "m", 
             "f", "f", "f", "m", "f"),
  IQ = rnorm(10, 100, 15),
  Other = runif(10),
  Test = rnorm(10),
  group2 = c("A", "A", "A", "A", "A",
             "B", "B", "B", "B", "B")
)

To achieve this for two groups (gender, group2) I could use

df %>% 
  gather(Variable, Value, -c(Gender, group2)) %>% 
  group_by(Gender, group2, Variable) %>% 
  summarise(mean = mean(Value), 
            min = min(Value), 
            max = max(Value)) 

which could be integrated with the new curly-curly operators from rlang with

descriptive_by <- function(data, group1, group2) {
  data %>% 
    gather(Variable, Value, -c({{ group1 }}, {{ group2 }})) %>% 
    group_by({{ group1 }}, {{ group2 }}, Variable) %>% 
    summarise(mean = mean(Value), 
              min = min(Value), 
              max = max(Value))
}

Usually, I would assume that I could substitute the specified groups with ..., but it doesn't seem to work like that

descriptive_by <- function(data, ...) {
  data %>% 
    gather(Variable, Value, -c(...)) %>% 
    group_by(..., Variable) %>% 
    summarise(mean = mean(Value), 
              min = min(Value), 
              max = max(Value))
}

as it returns the error

Error in map_lgl(.x, .p, ...) : object 'Gender' not found

j3ypi
  • 1,497
  • 16
  • 21

2 Answers2

2

Here is one possible solution, where the ... are passed on to group_by directly, and the gather just gathers the numeric columns (since I suppose it should never gather the non-numeric columns independent of the input ...).

library(tidyverse)

set.seed(1)

## data
df <- tibble(
    Gender = c("m", "f", "f", "m", "m", 
        "f", "f", "f", "m", "f"),
    IQ = rnorm(10, 100, 15),
    Other = runif(10),
    Test = rnorm(10),
    group2 = c("A", "A", "A", "A", "A",
        "B", "B", "B", "B", "B")
)

## function
descriptive_by <- function(data, ...) {

  data %>% 
      gather(Variable, Value, names(select_if(., is.numeric))) %>% 
      group_by(..., Variable) %>% 
      summarise(mean = mean(Value), 
          min = min(Value), 
          max = max(Value))
}

descriptive_by(df, Gender, group2)
#> # A tibble: 12 x 6
#> # Groups:   Gender, group2 [4]
#>    Gender group2 Variable    mean      min     max
#>    <chr>  <chr>  <chr>      <dbl>    <dbl>   <dbl>
#>  1 f      A      IQ        95.1    87.5    103.   
#>  2 f      A      Other      0.432   0.212    0.652
#>  3 f      A      Test       0.464  -0.0162   0.944
#>  4 f      B      IQ       100.     87.7    111.   
#>  5 f      B      Other      0.281   0.0134   0.386
#>  6 f      B      Test       0.599   0.0746   0.919
#>  7 m      A      IQ       106.     90.6    124.   
#>  8 m      A      Other      0.442   0.126    0.935
#>  9 m      A      Test       0.457  -0.0449   0.821
#> 10 m      B      IQ       109.    109.     109.   
#> 11 m      B      Other      0.870   0.870    0.870
#> 12 m      B      Test      -1.99   -1.99    -1.99
Joris C.
  • 5,721
  • 3
  • 12
  • 27
  • Nice, but I'd like to profit from auto-completion for the arguments of the function, which gets lost when I use strings. – j3ypi Jul 10 '19 at 19:32
  • There is a bracket too much in the `group_by()` statement. But it's exactly what I was looking for - great, thanks! – j3ypi Jul 11 '19 at 08:05
1

The complicated part is figuring out how to negate NSE variables (xxx vs -xxx). Here's an example of how I would approach it:

desc_by <- function(dat, ...) {

  drops <- lapply(enquos(...), function(d) call("-", d))

  dat %>% 
    gather(var, val, !!!drops) %>% 
    group_by(...) %>% 
    summarise_at(vars(val), funs(min, mean, max))

}

desc_by(head(iris), Species, Petal.Width)
# A tibble: 2 x 5
# Groups:   Species [1]
  Species Petal.Width   min  mean   max
  <fct>         <dbl> <dbl> <dbl> <dbl>
1 setosa          0.2   1.3  3.18   5.1
2 setosa          0.4   1.7  3.67   5.4

You still have to use enquos and !!! in order to apply - to each variable, but otherwise the ... can be used for grouping, etc unchanged. Thus you don't need the new "mustache"/curly-curly operators at all.

Brian
  • 7,900
  • 1
  • 27
  • 41