3

I've read most of the documentation about tidy evaluation and programming with dplyr but cannot get my head around this (simple) problem.

I want to programm with dplyr and give column names as strings as input to the function.

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5),
  b = sample(5)
)

my_summarise <- function(df, group_var) {
  df %>%
    group_by(group_var) %>%
    summarise(a = mean(a))
}

my_summarise(df, 'g1')

This gives me Error : Column 'group_var' is unknown.

What must I change inside the my_summarise function in order to make this work?

3 Answers3

4

Convert the string column name to a bare column name using as.name() and then use the new {{}} (read Curly-Curly) operator as below:

library(dplyr)

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5),
  b = sample(5)
)

my_summarise <- function(df, group_var) {

  grp_var <- as.name(group_var)

  df %>%
    group_by({{grp_var}}) %>%
    summarise(a = mean(a))
}

my_summarise(df, 'g1')
Vishal Katti
  • 532
  • 2
  • 6
  • Will this function then still work if I input the variable directly and not as a string? Or is a type catching needed in this function? – Fnguyen Apr 21 '20 at 17:11
  • if you are going to give the column name directly without quotes, then you don't need the line `grp_var <- as.name(group_var)`. You new function would be as follows: ``` my_summarise <- function(df, group_var) { df %>% group_by({{group_var}}) %>% summarise(a = mean(a)) } ``` – Vishal Katti Apr 21 '20 at 17:25
  • yes but suppose I want to "proof" this function against both types of arguments (string and literal), would the ```as.name``` still work or would I need to implement a type check at the beginning of the function? This is purely out of curiosity as I am not OP. – Fnguyen Apr 21 '20 at 17:50
  • you can use `ensym` to cover both cases. you will need to use the `!!` (bang-bang) operator in combination with `ensym` from rlang package. – Vishal Katti Apr 22 '20 at 10:06
3

We can use also ensym with !!

my_summarise <- function(df, group_var) {


  df %>%
    group_by(!!rlang::ensym(group_var)) %>%
    summarise(a = mean(a))
   }

my_summarise(df, 'g1')

Or another option is group_by_at

my_summarise <- function(df, group_var) {


      df %>%
        group_by_at(vars(group_var)) %>%
        summarise(a = mean(a))
       }

my_summarise(df, 'g1')
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can also use sym and !!

my_summarise <- function(df, group_var) {


  df %>%
    group_by(!!sym(group_var)) %>%
    summarise(a = mean(a))
   }

my_summarise(df, 'g1')

# A tibble: 2 x 2
     g1     a
  <dbl> <dbl>
1     1  3.5 
2     2  2.67
Kay
  • 2,057
  • 3
  • 20
  • 29