1

I have a function that calculates the means of a grouped database for a column which is chosen based on the content of a variable VarName. The current function uses dplyr::summarize_, but now I see this is deprecated, and I want to replace it before it is fully removed.

However, I'm not sure how to use the new unquoting to achieve what I'm trying to do. Here's my current code:

means<-summarize_(group_by(dat,Grade),.dots = setNames(paste0('mean(',VarName,',na.rm=TRUE)'),'means'))

I tried replacing the .dots part with means=mean(!!VarName, na.rm=TRUE), but that just returned the string inside VarName. What I need is for the string in VarName to be evaluated as the column name within dat, so that I'll get a column name "means" with the mean of each group. How can I achieve that with the new summarize?

Sample dataset for reproducibility:

VarName<-"Things"
dat<-data.frame(students=c("a","b","c","d","e"),Grade=c(2,2,2,3,3),varA=c(41:45),Things=c(90,100,80,75,80))

Thanks!

eipi10
  • 91,525
  • 24
  • 209
  • 285
iod
  • 7,412
  • 2
  • 17
  • 36

2 Answers2

4

Turning this into a function and generalizing for arbitrary data, grouping variable, and value variable:

library(tidyverse)

means <- function(data, group, value) {

  group = enquo(group)
  value = enquo(value)
  value_name = paste0("mean_", value)[2]

  data %>% group_by(!!group) %>% 
    summarise(!!value_name := mean(!!value, na.rm=TRUE))
}

means(dat, Grade, Things)
  Grade mean_Things
  <dbl>       <dbl>
1  2.00        90.0
2  3.00        77.5

If I understand your comment, how about the function below, which takes a string for the value argument:

means <- function(data, group, value) {

  group = enquo(group)
  value_name = paste0("mean_", value)
  value = sym(value)

  data %>% group_by(!!group) %>% 
    summarise(!!value_name := mean(!!value, na.rm=TRUE))
}

VarName = "Things"

means(dat, Grade, VarName)
  Grade mean_Things
  <dbl>       <dbl>
1  2.00        90.0
2  3.00        77.5

Since the function is generalized, you can do this with any data frame. For example:

means(mtcars, cyl, "mpg")
    cyl mean_mpg
  <dbl>    <dbl>
1  4.00     26.7
2  6.00     19.7
3  8.00     15.1

You can generalize the function still further. For example, this version takes an arbitrary number of grouping columns:

means <- function(data, value, ...) {

  group = quos(...)
  value_name = paste0("mean_", value)
  value = sym(value)

  data %>% group_by(!!!group) %>% 
    summarise(!!value_name := mean(!!value, na.rm=TRUE))
}

VarName = "Things"

means(dat, VarName, students, Grade)
  students Grade mean_Things
  <fct>    <dbl>       <dbl>
1 a         2.00        90.0
2 b         2.00       100  
3 c         2.00        80.0
4 d         3.00        75.0
5 e         3.00        80.0
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Thanks. However, I don't want this to be within a function. It's all inside a Shiny app, and the VarName comes from the UI side. When I try doing what you did here in my code, I got all `NA`s. I thing the problem is that for this to work `Things` had to be unquoted, but I'm not sure how I can make it unquoted in the first place when I get it through the `input`... – iod Mar 29 '18 at 02:10
  • I'm not looking to generlize it further. Quite the opposite - I'm trying to do this without putting everything into a function... It doesn't seem to work when I do that. – iod Mar 29 '18 at 02:33
  • OK, it works now! Great, thanks! Can you explain the difference between `enquo()` and `sym()`? – iod Mar 29 '18 at 03:29
  • 1
    `enquo` is used within functions to turn a name (like `Grade`) into a quosure. `sym` turns a string into a name. I'm not actually sure why `sym(value)` can later by unquoted with `!!` without having been turned into a quosure. I don't really get the tidyeval/quosure thing and find it difficult and confusing to use in all but the simplest applications. – eipi10 Mar 29 '18 at 05:16
1

Use !! with as.name or as.symbol:

dat %>% 
    group_by(Grade) %>% 
    summarize(means = mean(!!as.name(VarName), na.rm=T))
    # or summarize(means = mean(!!as.symbol(VarName), na.rm=T))

# A tibble: 2 x 2
#  Grade means
#  <dbl> <dbl>
#1  2.00  90.0
#2  3.00  77.5
Psidom
  • 209,562
  • 33
  • 339
  • 356