1

Suppose we have some vectors and dataframes:

a <- c(1, 2, 0, 1)
b <- c(6, 4)
df1 <- data_frame(x = c(6, 8, 12), y = c(24, 18, 16))

We write a function using non standard evaluation that calculates the mean of a column of the dataframe and the mean of a vector.

calculate_means <- function(df, column, vector) {
  column <- enquo(column)
  summarise(df, mean_column = mean(!!column), mean_vector = mean(vector))
}

calculate_means(df1, x, a)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        8.67        1.00

calculate_means(df1, y, b)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        19.3        5.00

That works as expected. But what happens if we write the same function but choosing another names for the parameters?

calculate_means <- function(df, x, y) {
  x <- enquo(x)
  summarise(df, mean_column = mean(!!x), mean_vector = mean(y))
}

calculate_means(df1, x, a)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        8.67        19.3

calculate_means(df1, y, b)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        19.3        19.3

The first parameter is evaluating the same as before, but the second parameter is always evaluating the column "y" of the dataframe. Shouldn't it be evaluating the vectors "a" and "b" respectively?

bienqueda
  • 89
  • 5

2 Answers2

0

We can use globalenv() to get the list of objects, get the value of the object by passing the object name as string and use that in the summarise statement

calculate_means <- function(df, x, y) {
  x <- enquo(x)
  y <- quo_name(enquo(y))
  v1 <- globalenv()[[y]]

  df %>%
       summarise(mean_column = mean(!! x),
       mean_vector = mean(v1))
 }

calculate_means(df1, x, a)
# A tibble: 1 x 2
#   mean_column mean_vector
#        <dbl>       <dbl>
#1        8.67        1.00

calculate_means(df1, y, b)
# A tibble: 1 x 2
#   mean_column mean_vector
#        <dbl>       <dbl>
#1        19.3        5.00

Suppose, if we also need to get the mean of 'y' column

calculate_means <- function(df, x, y) {
  x <- enquo(x)
  y <- quo_name(enquo(y))
  v1 <- globalenv()[[y]]

  df %>%
       summarise(mean_column = mean(!! x),
       mean_vector = mean(v1),
       mean_column2 = mean(.data$y))
 }

calculate_means(df1, x, a)
# A tibble: 1 x 3
#  mean_column mean_vector mean_column2
#        <dbl>       <dbl>        <dbl>
#1        8.67        1.00         19.3
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Yes. I understand you, but I don't see how this is the expected behaviour. If we want to create a function and we want one of the parameters to be a variable, do we need to "unquo" it? If I call a function and I pass a value "a" for the parameter "y", I could understand that I'm not doing it correctly and the function throws an error. But I don't understand why it uses the name of the parameter as the value to use. – bienqueda Feb 21 '18 at 12:45
  • @bienqueda The `quo_name` step is for the `globalenv` I am not sure about the expecfted behavior of your function. As I understand, you have a column with the same as in the global env object and here you wanted to get the mean of the global variable instead of the column. Regarding the `enquo`, it is for converting the symbol to quosure – akrun Feb 21 '18 at 12:48
0

Variables in the actual arguments to summarise are first looked up in the data frame specified in the first argument to summarise and are only looked up in the caller to summarise if not found in that data frame. Thus by hard coding y into a summarise argument it will always look for it in df1 first.

1) We can use !! to avoid this. The argument to !! is not looked up in the data argument.

calculate_means2 <- function(df, x, y) {
  x <- enquo(x)
  summarise(df, mean_column = mean(!!x), mean_vector = mean(!!y))
}

calculate_means2(df1, y, b)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        19.3        5.00

2) We could use as_quosure to emphasize this. That will put the value of y in the quosure formula. In the example, y <- as_quosure(y) would cause the new quosure to contain ~c(6, 4).

calculate_means3 <- function(df, x, y) {
  x <- enquo(x)
  y <- rlang::as_quosure(y)
  summarise(df, mean_column = mean(!!x), mean_vector = mean(!!y))
}

calculate_means3(df1, y, b)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        19.3        5.00

3) Of course, we could just use a formal argument name that is unlikely to be used in the data frame:

calculate_means4 <- function(df, x, y.) {
  x <- enquo(x)
  summarise(df, mean_column = mean(!!x), mean_vector = mean(y.))
}

calculate_means4(df1, y, b)
# A tibble: 1 x 2
  mean_column mean_vector
        <dbl>       <dbl>
1        19.3        5.00
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • The first solution is very clear and simple. But I'm wondering in what cases we would want the arguments of the function that we have created to be evaluated without !! when they are used inside the summarise. It seems that we always want to use !! with the arguments of our function. So why is not that default? It's different when we want to evaluate something with the same value each time we call the function, in that case we don't create an argument in our function for that value, so no need of !! – bienqueda Feb 21 '18 at 16:23
  • No matter which were looked at first if you want the other it would still be necessary to somehow specify it. – G. Grothendieck Feb 21 '18 at 20:28