Passing arguments to dplyr summarize function

Question

I am trying to use the summarize function within dplyr to calculate summary statistics using a two argument function that passes a table and field name from a connected database. Unfortunately as soon as I wrap the summarize function with another function the results aren't correct. The end table is a dataframe that does not iterate through each row. I'll show the input/output below:

Summary Statistics Function library(dplyr)

data<-iris
data<- group_by(.data = data,Species)

SummaryStatistics <- function(table, field){
table %>%
summarise(count = n(),
          min = min(table[[field]], na.rm = T),
          mean = mean(table[[field]], na.rm = T, trim=0.05),
          median = median(table[[field]], na.rm = T))
}

SummaryStatistics(data, "Sepal.Length")

Output Table--Incorrect, it's just repeating the same calculation

     Species count   min     mean median
1     setosa    50   4.3 5.820588    5.8
2 versicolor    50   4.3 5.820588    5.8
3  virginica    50   4.3 5.820588    5.8

Correct Table/Desired Outcome--This is what the table should look like. When I run the summarize function outsize of the wrapper function, this is what it produces.

      Species count   min     mean median
 1     setosa    50   4.3 5.002174    5.0
 2 versicolor    50   4.9 5.934783    5.9
 3  virginica    50   4.9 6.593478    6.5

I hope this is easy to understand. I just can't grasp as to why the summary statistics work perfectly outside of the wrapper function, but as soon as I pass arguments to it, it will calculate the same thing for each row. Any help would be greatly appreciated.

Thanks, Kev

Hard to diagnose without knowing how you're using the wrapper function. But at a guess, once inside the wrapper function, `summarize` might not know about the grouping factors being used in the calculation. So it would return the same summary for all rows. — jdobres, Jan 21 '17 at 19:18
You'll need to use standard evaluation. Read the `dplyr` vignette on it for a better idea. — Jake Kaupp, Jan 21 '17 at 22:59

score 13 · Accepted Answer · edited Jul 07 '20 at 08:34

13

You need to use Non-Standard Evaluation (NSE) to use dplyr functions programmatically alongside lazyeval. The dplyr NSE vignette covers it fairly well.

library(dplyr)
library(lazyeval)

data <- group_by(iris, Species)

SummaryStatistics <- function(table, field){
  table %>%
    summarise_(count = ~n(),
              min = interp(~min(var, na.rm = T), var = as.name(field)),
              mean = interp(~mean(var, na.rm = T, trim=0.05), var = as.name(field)),
              median = interp(~median(var, na.rm = T), var = as.name(field)))
}

SummaryStatistics(data, "Sepal.Length")

# A tibble: 3 × 5
     Species count   min     mean median
      <fctr> <int> <dbl>    <dbl>  <dbl>
1     setosa    50   4.3 5.002174    5.0
2 versicolor    50   4.9 5.934783    5.9
3  virginica    50   4.9 6.593478    6.5

edited Jul 07 '20 at 08:34

pietrodito

1,783
15
24

answered Jan 22 '17 at 00:58

Jake Kaupp

7,892
2
26
36

2

Thank you for answering my question, and more importantly thank you for linking to the document on how to use dplyr programmatically. I was searching for something like this but couldn't track it down. I really appreciate the thoroughness of your answer. Thanks again man. – AlphaKevy Jan 22 '17 at 01:17
3

The NSE vignette link is dead, it looks like it's been replaced with the [Programming with dplyr vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html). – ropeladder Nov 02 '17 at 17:17

Passing arguments to dplyr summarize function

1 Answers1

Linked