7

Reproducible example

cats <-
  data.frame(
    name = c(letters[1:10]),
    weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
    type = c(rep("not_fat", 5), rep("fat", 5))
  )

get_means <- function(df, metric, group) {
  df %>%
    group_by(.[[group]]) %>%
    mutate(mean_stat = mean(.[[metric]])) %>%
    pull(mean_stat) %>%
    unique()
}

get_means(cats, metric = "weight", group = "type")

What I tried

I expect to get two values back, instead I get one value. It appears that the groupby is failing.

I tried everything including using quo(), eval() and substitute(), UQ(), !!, and a whole host of other things to try and make the stuff inside the group_by() work.

This seems awfully simple but I can't figure it out.

Reasoning for code

The decision for variables to be in quotes is because I am using them in ggplot aes_string() calls. I excluded ggplot code inside the function to simplify the code, otherwise it'd be easy because we could use standard evaluation.

eipi10
  • 91,525
  • 24
  • 209
  • 285
Robert Tan
  • 634
  • 1
  • 8
  • 21

6 Answers6

7

I think the "intended" way to do this in the tidyeval framework is to enter the arguments as names (rather than strings) and then quote the arguments using enquo(). ggplot2 understands tidy evaluation operators so this works for ggplot2 as well.

First, let's adapt the dplyr summary function in your example:

library(tidyverse)
library(rlang)

get_means <- function(df, metric, group) {

  metric = enquo(metric)
  group = enquo(group)

  df %>%
    group_by(!!group) %>%
    summarise(!!paste0("mean_", as_label(metric)) := mean(!!metric))
}

get_means(cats, weight, type)
  type    mean_weight
1 fat            20.0
2 not_fat        10.2
get_means(iris, Petal.Width, Species)
  Species    mean_Petal.Width
1 setosa                0.246
2 versicolor            1.33 
3 virginica             2.03

Now add in ggplot:

get_means <- function(df, metric, group) {

  metric = enquo(metric)
  group = enquo(group)

  df %>%
    group_by(!!group) %>%
    summarise(mean_stat = mean(!!metric)) %>% 
    ggplot(aes(!!group, mean_stat)) + 
      geom_point()
}

get_means(cats, weight, type)

enter image description here

I'm not sure what type of plot you have in mind, but you can plot the data and summary values using tidy evaluation. For example:

plot_func = function(data, metric, group) {

  metric = enquo(metric)
  group = enquo(group)

  data %>% 
    ggplot(aes(!!group, !!metric)) + 
      geom_point() +
      geom_point(data=. %>% 
                   group_by(!!group) %>%
                   summarise(!!metric := mean(!!metric)),
                 shape="_", colour="red", size=8) + 
      expand_limits(y=0) +
      scale_y_continuous(expand=expand_scale(mult=c(0,0.02)))
}

plot_func(cats, weight, type)

enter image description here

FYI, you can allow the function to take any number of grouping variables (including none) using the ... argument and enquos instead of enquo (which also requires the use of !!! (unquote-splice) instead of !! (unquote)).

get_means <- function(df, metric, ...) {

  metric = enquo(metric)
  groups = enquos(...)

  df %>%
    group_by(!!!groups) %>%
    summarise(!!paste0("mean_", quo_text(metric)) := mean(!!metric))
}
get_means(mtcars, mpg, cyl, vs)
    cyl    vs mean_mpg
1     4     0     26  
2     4     1     26.7
3     6     0     20.6
4     6     1     19.1
5     8     0     15.1
get_means(mtcars, mpg)
  mean_mpg
1     20.1
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • 2
    Nice answer! Just note that `quo_text()` is not appropriate in that context. It's a multi-line deparser. You can use `as_label()` or `as_name()` instead which are guaranteed to return a single-line string. The latter checks that its input is a variable name and not a function call, which is appropriate in many cases. Here `as_label()` would be fine because your function accepts inline transformations of variables, e.g. you can pass `get_means(mtcars, mpg * 100)`. – Lionel Henry Mar 30 '19 at 01:30
  • Thanks @lionel. Follow-up question: Like `quo_text()`, `as_label()` is an `rlang` function. My (potentially incorrect) impression is that the "average" tidyeval user shouldn't need to resort to `rlang` functions in the normal course of programming. Is there a way to generate dynamic compound column names using only functions in the standard `tidyverse` packages? – eipi10 Apr 01 '19 at 17:48
  • 1
    You're right. We are going to export `as_label()` in the tidyverse packages as well. – Lionel Henry Apr 02 '19 at 09:43
4

If you want to use strings for the names, as in your example, the correct way to do this is to convert the string to a symbol with sym and unquote with !!:

get_means <- function(df, metric, group) {
    df %>%
      group_by(!!sym(group)) %>%
      mutate(mean_stat = mean(!!sym(metric))) %>%
      pull(mean_stat) %>%
      unique()
}

get_means(cats, metric = "weight", group = "type")
[1] 10.06063 17.45906

If you want to use bare names in your function, then use enquo with !!:

get_means <- function(df, metric, group) {
    group <- enquo(group)
    metric <- enquo(metric)
    df %>%
      group_by(!!group) %>%
      mutate(mean_stat = mean(!!metric)) %>%
      pull(mean_stat) %>%
      unique()
}

get_means(cats, metric = weight, group = type)
[1] 10.06063 17.45906

What is happening in your example?

Interestingly .[[group]], does work for grouping, but not the way you think. This subsets the stated column of the dataframe as a vector, then makes that a new variable that it groups on:

cats %>%
    group_by(.[['type']])

# A tibble: 10 x 4
# Groups:   .[["type"]] [2]
   name  weight type    `.[["type"]]`
   <fct>  <dbl> <fct>   <fct>        
 1 a       9.60 not_fat not_fat      
 2 b       8.71 not_fat not_fat      
 3 c      12.0  not_fat not_fat      
 4 d       8.48 not_fat not_fat      
 5 e      11.5  not_fat not_fat      
 6 f      17.0  fat     fat          
 7 g      20.3  fat     fat          
 8 h      17.3  fat     fat          
 9 i      15.3  fat     fat          
10 j      17.4  fat     fat  

Your problem comes with the mutate statement. Instead of selecting the, mutate(mean_stat = mean(.[['weight']])) simply extracts the weight column as a vector, computes the mean, and then assigns that single value to the new column

cats %>%
    group_by(.[['type']]) %>%
      mutate(mean_stat = mean(.[['weight']]))
# A tibble: 10 x 5
# Groups:   .[["type"]] [2]
   name  weight type    `.[["type"]]` mean_stat
   <fct>  <dbl> <fct>   <fct>             <dbl>
 1 a       9.60 not_fat not_fat            13.8
 2 b       8.71 not_fat not_fat            13.8
 3 c      12.0  not_fat not_fat            13.8
 4 d       8.48 not_fat not_fat            13.8
 5 e      11.5  not_fat not_fat            13.8
 6 f      17.0  fat     fat                13.8
 7 g      20.3  fat     fat                13.8
 8 h      17.3  fat     fat                13.8
 9 i      15.3  fat     fat                13.8
10 j      17.4  fat     fat                13.8
divibisan
  • 11,659
  • 11
  • 40
  • 58
  • 1
    When is `sym` needed or not needed? For example, you can do `group_by(!!group)` and it seems to work. – thc Mar 29 '19 at 22:16
  • 1
    `sym` turns strings into symbols that you can use, `enquo` turns bare-name passed into a function into things that can be used. So if you pass `"type"` into the function, you need `sym`, but if you pass `type` in, use `enquo` – divibisan Mar 29 '19 at 23:05
  • I think this got answered first. So, commenting here. It would be better to also show a function that can take both quoted and unquoted arguments – akrun Mar 30 '19 at 01:11
3

The magrittr pronoun . represents the whole data, so you've taken the mean of all observations. Instead, use the tidy eval pronoun .data which represents the slice of data frame for the current group:

get_means <- function(df, metric, group) {
  df %>%
    group_by(.data[[group]]) %>%
    mutate(mean_stat = mean(.data[[metric]])) %>%
    pull(mean_stat) %>%
    unique()
}
Lionel Henry
  • 6,652
  • 27
  • 33
2

I would go with slight modification (if I understand correctly what you would like to achive):

 get_means <- function(df, metric, group) {
      df %>%
        group_by(!!sym(group)) %>%
        summarise(mean_stat = mean(!!sym(metric)))%>% pull(mean_stat)
    }
    get_means(cats, "weight", "type")

[1] 20.671772  9.305811

gives exactly same output as :

cats %>% group_by(type) %>% summarise(mean_stat=mean(weight)) %>%
  pull(mean_stat)

[1] 20.671772  9.305811
piotr
  • 152
  • 1
  • 2
  • 13
  • 1
    While equivalent most of the time, it is a bit safer to subset `.data` instead of unquoting symbols. That's because `.data[[col]]` checks that the variable exists in the data frame. Also it looks simpler to people who are not accustomed to tidy eval. – Lionel Henry Mar 30 '19 at 01:24
0

using *_at functions :

library(dplyr)
get_means <- function(df, metric, group) {
  df %>%
    group_by_at(group) %>%
    mutate_at(metric,list(mean_stat = mean)) %>%
    pull(mean_stat) %>%
    unique()
}

get_means(cats, metric = "weight", group = "type")
# [1] 10.12927 20.40541

data

set.seed(1)
cats <-
  data.frame(
    name = c(letters[1:10]),
    weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
    type = c(rep("not_fat", 5), rep("fat", 5))
  )
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
0

Updated answer usingacross(), .data and {} for renaming, and keeping the original function arguments as strings per OP:

library(tidyverse)

get_means <- function(dat = mtcars, metric = "wt", group = "cyl") {
  dat %>%
    group_by(across(all_of(c(group)))) %>%
    summarise("{paste0('mean_',metric)}" := mean(.data[[metric]]), .groups="keep")
}

get_means()

see: ?dplyr_data_masking for more detailed discussion.

Brian D
  • 2,570
  • 1
  • 24
  • 43