1

I am attempting to create a function which takes a list as input, and returns a summarised data frame. However, after trying multiple ways, I am unable to pass a list to the function for the aggregation.

So far I have the following, but it is failing.

library(dplyr)

random_df <- data.frame(
  region = c("A", "B", "C", "C"),
  number_of_reports = c(1, 3, 2, 1),
  report_MV = c(12, 33, 22, 12)
)

output_graph <- function(input) {
    print(input$arguments)
    DF <- input$DF
    group_by <- input$group_by
    args <- input$arguments
    flow <- ddply(DF, group_by, summarize, args)
    return(flow)
}

graph_functions <- list(
    DF = random_df,
    group_by = .(region),
    arguments = .(Reports = sum(number_of_reports),
                  MV_Reports = sum(report_MV))
)

output_graph(graph_functions)

Where this works:

library(dplyr)

random_df <- data.frame(
  region = c("A", "B", "C", "C"),
  number_of_reports = c(1, 3, 2, 1),
  report_MV = c(12, 33, 22, 12)
)

output_graph <- function(input) {
    print(input$arguments)
    DF <- input$DF
    group_by <- input$group_by
    args <- input$arguments
    flow <- ddply(
      DF,
      group_by, 
      summarize,
      Reports = sum(number_of_reports),
      MV_Reports = sum(report_MV)
    )
    return(flow)
}

graph_functions <- list(
  DF = random_df,
  group_by = .(region),
  arguments = .(Reports = sum(number_of_reports),
                MV_Reports = sum(report_MV))
)

output_graph(graph_functions)

Would anyone be aware of a way to pass a list of functions to ddply? Or another way to achieve the same goal of aggregating a dynamic set of variables.

Kevin Arseneau
  • 6,186
  • 1
  • 21
  • 40
Abel Riboulot
  • 158
  • 1
  • 8

1 Answers1

1

In order to pass arguments into the function for use by dplyr, I recommend reading this regarding non-standard evaluation (NSE). Here is an edited function producing the same output as my original.

library(dplyr)

random_df <- data.frame(
  region = c('A','B','C','C'),
  number_of_reports = c(1, 3, 2, 1),
  report_MV = c(12, 33, 22, 12)
)

output_graph <- function(df, group, args) {

  grp_quo <- enquo(group)

  df %>%
    group_by(!!grp_quo) %>%
    summarise(!!!args)

}

args <- list(
  Reports = quo(sum(number_of_reports)),
  MV_Reports = quo(sum(report_MV))
)

output_graph(random_df, region, args)

# # A tibble: 3 x 3
#   region Reports MV_Reports
#   <fctr>   <dbl>      <dbl>
# 1 A         1.00       12.0
# 2 B         3.00       33.0
# 3 C         3.00       34.0
Kevin Arseneau
  • 6,186
  • 1
  • 21
  • 40
  • Hi Kevin, thanks a lot for the code. What I am after is actually for a function which would dynamically do this summarization. In this case the features are .(Reports = sum(number_of_reports), MV_Reports = sum(report_MV)) But it might as well be .(number_reports = length(number_of_reports), MV_Reports = sum(report_MV)) Or .(number_reports = length(number_of_reports)) Therefore I aim to make a very versatile function which would take a list as an input and pass it to ddply. Thanks again! – Abel Riboulot Dec 27 '17 at 04:06
  • You may need to clarify further, are you saying you want to assign the the `group_by` and `summarise` arguments dynamically? – Kevin Arseneau Dec 27 '17 at 04:08
  • That's exactly right, I want to assign the group_by and summarise dynamically. – Abel Riboulot Dec 27 '17 at 04:12
  • thanks a ton for the help! The code looks exactly like what I'm trying to do, however it throws the following error to me: "Error in !args : invalid argument type ". I'll try to debug and let you know. – Abel Riboulot Dec 27 '17 at 04:50
  • @AbelRiboulot, it is likely a conflict with `plyr`, I am only using `dplyr` verbs – Kevin Arseneau Dec 27 '17 at 04:53
  • Yes, it was. Thanks a lot, works like a charm, and will give a read regarding NSE. – Abel Riboulot Dec 27 '17 at 04:54