Using NSE enquo with dply and tidyr (issue with complete(nesting(...), ...)

Question

I'm trying to use the NSE with dplyr and tidyr.

I have a list of product, their volume per month per region. The products and regions have hierarchical structure (ProductLevel1 coarser than ProductLevel2 coarser than Product Level3 ...). Before making a graph, I want to add the months with no volume using tidyr::nesting and tidyr::complete.

This is at the level of tidyr::nesting the function seems to fail.

f_test <- function(IS1, finerProductLevel, finerRegionLevel) {  
  qfinerProductLevel <- enquo(finerProductLevel)
  qfinerRegionLevel <- enquo(finerRegionLevel)
  IS2 <- IS1 %>% 
    group_by(calDate, !! qfinerRegionLevel, !! qfinerProductLevel) %>%
    summarize(Vol = sum(Vol, na.rm = TRUE)) %>%
    ungroup()

  IS3 <- IS2 %>%
    complete(nesting(!! qfinerProductLevel, !! qfinerRegionLevel), 
             calDate, fill = list(Vol=0))
}
f_test(IS1, ProductLevel4, RegionLevel4_18)

the error I receive is:

Error in eval_tidy(xs[[i]], unique_output) : object 'ProductLevel4' not found

If I remove the !!s in the nesting function, then the error is

Error: Columns `qfinerProductLevel`, `qfinerRegionLevel` must be 1d atomic vectors or lists

There was a similar question in the past with no answer: dplyr programming: unquote-splicing causes overscope error with complete() and nesting()

A sample of the dataset:

IS1 <- structure(list(COPASalesOffice = c("DACH", "DACH", "EMO", "EMO", 
"France", "French Africa", "Israel", "Italy", "Middle East", 
"Nordic", "South Hub", "South Hub", "Spain Portugal", "Spain Portugal", 
"Turkey", "Turkey", "Turkey", "UK Ireland", "UK Ireland"), calDate = structure(c(16861, 
17440, 16983, 17410, 16436, 16709, 16161, 16222, 17136, 16526, 
16861, 16922, 16587, 17136, 16495, 16283, 16983, 16222, 16740
), class = "Date"), Vol = c(75.17, 121.731, 0, 23, 0, 15, 30, 
173.36, 0, 9.217, -1, 0, 0.849, 12.154, 0, 0, 25, 1, 20.313), 
    RegionLevel0 = c("EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", 
    "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", 
    "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR"
    ), RegionLevel1 = c("Europe", "Europe", "MEAR", "MEAR", "Europe", 
    "MEAR", "MEAR", "Europe", "MEAR", "Europe", "Europe", "Europe", 
    "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", 
    "Europe"), RegionLevel2 = c("EU5-NB", "EU5-NB", "CIS", "CIS", 
    "EU5-NB", "MEA", "MEA", "EU5-NB", "MEA", "EU5-NB", "CEE", 
    "CEE", "EU5-NB", "EU5-NB", "CEE", "CEE", "CEE", "EU5-NB", 
    "EU5-NB"), RegionLevel4_18 = c("R01", "R01", "R20", "R20", 
    "R03", "R14", "R17", "R04", "R12", "R06", "R08", "R08", "R05", 
    "R05", "R09", "R09", "R09", "R02", "R02"), ProductLevel1 = c("aqueous", 
    "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", 
    "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", 
    "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous"
    ), ProductLevel2 = c("aq8", "aq8", "aq8", "aq8", "aq8", "aq4", 
    "aq4", "aq4", "aq8", "aq8", "aq8", "aq8", "aq8", "aq8", "aq8", 
    "aq8", "aq4", "aq8", "aq8"), ProductLevel3 = c("8.24-44", 
    "8.24-44", "8.24-44", "8.24-44", "8.24-44", "4.24-44", "4.24-44", 
    "4.24-44", "8.A2", "8.24-44", "64in", "8.A2", "8.24-44", 
    "64in", "8.24-44", "8.24-44", "4.24-44", "8.24-44", "8.24-44"
    ), ProductLevel4 = c("T636-596-642, H&H", "T804, SC-P6/7/8/9000", 
    "T636-596-642, H&H", "T636-596-642, H&H", "T40X, 9000", "T692-693-694-698 Opr", 
    "T611-612, 74x0-94x0", "T611-612, 74x0-94x0", "T850, SCP800", 
    "T543-544, 40-76-9600", "T591, 11880", "T605-606, 48x0", 
    "T543-544, 40-76-9600", "T591, 11880", "T46X, 7000", "T602-603, 78x0-98x0", 
    "T692-693-694-698 Opr", "T47X, 9500", "T543-544, 40-76-9600"
    )), row.names = c(NA, -19L), class = c("tbl_df", "tbl", "data.frame"
), .Names = c("COPASalesOffice", "calDate", "Vol", "RegionLevel0", 
"RegionLevel1", "RegionLevel2", "RegionLevel4_18", "ProductLevel1", 
"ProductLevel2", "ProductLevel3", "ProductLevel4"))

and the SessionInfo:

Rstudio 1.1.442

R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base  

other attached packages:
[1] forcats_0.3.0   stringr_1.3.0   dplyr_0.7.4     purrr_0.2.4     readr_1.1.1     tidyr_0.8.0     tibble_1.4.2    ggplot2_2.2.1   tidyverse_1.2.1

CJ Yetman · Accepted Answer · 2018-04-25T21:53:40.907

In the call to your f_test() function, the 2nd and 3rd arguments you are passing are names of objects that don't exist. You probably want to quote them so that they are passed as strings. Additionally, you probably want your f_test() function to return something... I suspect return(IS3) is the result you're expecting.

IS1 <- structure(list(COPASalesOffice = c("DACH", "DACH", "EMO", "EMO", "France", "French Africa", "Israel", "Italy", "Middle East", "Nordic", "South Hub", "South Hub", "Spain Portugal", "Spain Portugal", "Turkey", "Turkey", "Turkey", "UK Ireland", "UK Ireland"), calDate = structure(c(16861, 17440, 16983, 17410, 16436, 16709, 16161, 16222, 17136, 16526, 16861, 16922, 16587, 17136, 16495, 16283, 16983, 16222, 16740 ), class = "Date"), Vol = c(75.17, 121.731, 0, 23, 0, 15, 30, 173.36, 0, 9.217, -1, 0, 0.849, 12.154, 0, 0, 25, 1, 20.313), RegionLevel0 = c("EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR", "EMEAR" ), RegionLevel1 = c("Europe", "Europe", "MEAR", "MEAR", "Europe", "MEAR", "MEAR", "Europe", "MEAR", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe", "Europe"), RegionLevel2 = c("EU5-NB", "EU5-NB", "CIS", "CIS", "EU5-NB", "MEA", "MEA", "EU5-NB", "MEA", "EU5-NB", "CEE", "CEE", "EU5-NB", "EU5-NB", "CEE", "CEE", "CEE", "EU5-NB", "EU5-NB"), RegionLevel4_18 = c("R01", "R01", "R20", "R20", "R03", "R14", "R17", "R04", "R12", "R06", "R08", "R08", "R05", "R05", "R09", "R09", "R09", "R02", "R02"), ProductLevel1 = c("aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous", "aqueous" ), ProductLevel2 = c("aq8", "aq8", "aq8", "aq8", "aq8", "aq4", "aq4", "aq4", "aq8", "aq8", "aq8", "aq8", "aq8", "aq8", "aq8", "aq8", "aq4", "aq8", "aq8"), ProductLevel3 = c("8.24-44", "8.24-44", "8.24-44", "8.24-44", "8.24-44", "4.24-44", "4.24-44", "4.24-44", "8.A2", "8.24-44", "64in", "8.A2", "8.24-44", "64in", "8.24-44", "8.24-44", "4.24-44", "8.24-44", "8.24-44" ), ProductLevel4 = c("T636-596-642, H&H", "T804, SC-P6/7/8/9000", "T636-596-642, H&H", "T636-596-642, H&H", "T40X, 9000", "T692-693-694-698 Opr", "T611-612, 74x0-94x0", "T611-612, 74x0-94x0", "T850, SCP800", "T543-544, 40-76-9600", "T591, 11880", "T605-606, 48x0", "T543-544, 40-76-9600", "T591, 11880", "T46X, 7000", "T602-603, 78x0-98x0", "T692-693-694-698 Opr", "T47X, 9500", "T543-544, 40-76-9600" )), row.names = c(NA, -19L), class = c("tbl_df", "tbl", "data.frame" ), .Names = c("COPASalesOffice", "calDate", "Vol", "RegionLevel0", "RegionLevel1", "RegionLevel2", "RegionLevel4_18", "ProductLevel1", "ProductLevel2", "ProductLevel3", "ProductLevel4"))

library(dplyr)
library(tidyr)

f_test <- function(IS1, finerProductLevel, finerRegionLevel) {  
  qfinerProductLevel <- enquo(finerProductLevel)
  qfinerRegionLevel <- enquo(finerRegionLevel)
  IS2 <- IS1 %>% 
    group_by(calDate, !! qfinerRegionLevel, !! qfinerProductLevel) %>%
    summarize(Vol = sum(Vol, na.rm = TRUE)) %>%
    ungroup()

  IS3 <- IS2 %>%
    complete(nesting(!! qfinerProductLevel, !! qfinerRegionLevel), 
             calDate, fill = list(Vol=0))

  return(IS3)
}
f_test(IS1, "ProductLevel4", "RegionLevel4_18")

Update based on comment

If you're trying to replicate the result of something like this...

IS1 %>% 
  group_by(calDate, ProductLevel4, RegionLevel4_18) %>%
  summarize(Vol = sum(Vol, na.rm = TRUE)) %>%
  ungroup() %>%
  complete(nesting(ProductLevel4, RegionLevel4_18), 
           calDate, fill = list(Vol=0))

with a function that excepts "bare variable names" (not strings), like this...

f_test(IS1, ProductLevel4, RegionLevel4_18)

then you could write it like this...

f_test <-
  function(df, grp1, grp2) {
    sym_grp1 <- rlang::ensym(grp1)
    sym_grp2 <- rlang::ensym(grp2)
    df %>% 
      group_by(calDate, !!sym_grp1, !!sym_grp2) %>% 
      summarize(Vol = sum(Vol, na.rm = TRUE)) %>%
      ungroup() %>%
      complete(nesting(!!sym_grp1, !!sym_grp2), 
               calDate, fill = list(Vol=0))
  }

f_test(IS1, ProductLevel4, RegionLevel4_18)

Further Update

You could also use rlang::ensyms() to do both/all variable at once, and then !!! to unquote and splice them...

test <- function(data, var1, var2) {
  vars <- rlang::ensyms(var1, var2)

  data %>%
    group_by(calDate, !!!vars) %>%
    summarize(Vol = sum(Vol, na.rm = TRUE)) %>%
    ungroup() %>%
    complete(nesting(!!!vars), calDate, fill = list(Vol=0))
}
test(IS1, ProductLevel4, RegionLevel4_18)

which would then allow you to use ... to accept an arbitrary number of arguments...

test <- function(data, ...) {
  vars <- rlang::ensyms(...)

  data %>%
    group_by(calDate, !!!vars) %>%
    summarize(Vol = sum(Vol, na.rm = TRUE)) %>%
    ungroup() %>%
    complete(nesting(!!!vars), calDate, fill = list(Vol=0))
}
test(IS1, ProductLevel4, RegionLevel4_18)
test(IS1, ProductLevel4)
test(IS1, ProductLevel4, RegionLevel4_18, ProductLevel2)

thanks for your help. When replacing enquo by rlang::sym, it works. Then it is solved, even if I hoped for a solution with enquo. When I reproduce your proposition with enquo, the columns are named \`"ProductLevel4"\` and \`"RegionLevel4_18"\` and their values are replaced by a unique value is "ProductLevel4" and "RegionLevel4_18". — Fabien N., Apr 24 '18 at 20:46
I think the confusion might be related to `dplyr` not yet being fully converted over to the new `rlang` style of things, while `tidyr` has been already (why `enquo` works with `group_by` but not `nesting`). — CJ Yetman, Apr 24 '18 at 22:33

Using NSE enquo with dply and tidyr (issue with complete(nesting(...), ...)

1 Answers1