0

I am trying to write a function that operates on a data.frame and will accept dplyr-style arguments, i.e. column names that are not quoted by using dplyr's pronous (or whatever we call it).

But I have encountered a problem when using !! inside a bracketed expression (see below the examples).

Examples:

First a data.frame:

df <- data.frame(gah=c('a','a','a','a','b','b','b','b'), 
                 fruit=c('apple','apple','apple','banana','banana','banana','dog','dog'),
                 val=1:8, 
                 sss=-7:0,
                 mean=0)

First function, it averages a fixed column (val) as well as a column as given by the argument. It does not modify the grouping:

a_func <- function(df, value=val) {
  value_ = enquo(value)
  df %>% summarise(mean=mean(!!value_), mean_val=mean(val), n=n())
}
a_func(df, sss)
df %>% group_by(gah) %>% a_func()
df %>% group_by(gah) %>% a_func(sss)
df %>% group_by(gah, fruit) %>% a_func

This works as expected.

The next function adds a grouping variable before using summarise:

c_func <- function(df, gr) {
  gr_ = enquo(gr)
  df %>% group_by(!!gr_) %>% summarise(n=n())
}
c_func(df, gah)
c_func(df, gr=gah)
c_func(df, fruit)

This also works as expected.

Next, I combine the two. That should be doable - and it in fact is! Praise the Holy Kitten!

b_func <- function(df, value=val, gr=NA) {
  value_ = enquo(value)
  gr_ = enquo(gr)
  df %>% group_by(!!gr_, add=TRUE) %>%
    summarise(mean=mean(!!value_), mean_val=mean(val))
}
b_func(df, sss)
df %>% group_by(gah) %>% b_func(gr=fruit)
b_func(df, gr=fruit)
df %>% group_by(gah) %>% b_func(sss, fruit)

It clearly works as expected, albeit, with the optional argument gr I would like to only add the grouping variable when gr is not NA.

This is were it breaks: Adding a conditional to only do the grouping when gr is not NA, looking for the quosure from within the bracket somehow does not work.

d_func <- function(df, value=val, gr=NA) {
  value_ = enquo(value)
  gr_ = enquo(gr)
  if (!is.na(gr)) {
    df <- df %>% group_by(!!gr_)
  }
  df %>% 
    summarise(mean=mean(!!value_), mean_val=mean(val))
}
d_func(df, sss) # works
df %>% group_by(gah) %>% d_func(gr=fruit)
# Error in d_func(., gr = fruit) : object 'fruit' not found
d_func(df, gr=fruit) 
# Error in d_func(df, gr = fruit) : object 'fruit' not found
df %>% group_by(gah) %>% d_func(sss, fruit)
# Error in d_func(., sss, fruit) : object 'fruit' not found

It is clearly due to !!gr_ being called within the scope of additional brackets; remove the if and it's brackets and d_func is equivalent to b_func, and both groups by a column NA.

I do not understand why this occurs or how to solve this.

Updated with sessionInfo

R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252    LC_MONETARY=Danish_Denmark.1252
[4] LC_NUMERIC=C                    LC_TIME=Danish_Denmark.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rlang_0.2.0          bindrcpp_0.2.2       lemon_0.4.0          tidyr_0.8.0          magrittr_1.5        
[6] dplyr_0.7.4          odbc_1.1.5           RevoUtils_10.0.9     RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16       pillar_1.2.1       compiler_3.4.4     plyr_1.8.4         bindr_0.1.1        tools_3.4.4       
 [7] bit_1.1-12         tibble_1.4.2       gtable_0.2.0       lattice_0.20-35    pkgconfig_2.0.1    openxlsx_4.0.17   
[13] cli_1.0.0          rstudioapi_0.7     DBI_0.8            yaml_2.1.18        gridExtra_2.3      knitr_1.20        
[19] hms_0.4.2          bit64_0.9-7        grid_3.4.4         tidyselect_0.2.4   glue_1.2.0         R6_2.2.2          
[25] ggplot2_2.2.1.9000 purrr_0.2.4        blob_1.1.1         scales_0.5.0       assertthat_0.2.0   colorspace_1.3-2  
[31] utf8_1.1.3         lazyeval_0.2.1     munsell_0.4.3      crayon_1.3.4     
MrGumble
  • 5,631
  • 1
  • 18
  • 33
  • I get a warning instead of an error with the last set of code blocks. The reason is due to `if (!is.na(gr))` where `if/else` works on a single element instead of a vector with length > 1. Could be due to the versions for the error? – akrun Jun 22 '18 at 06:52
  • The warning should also go away with `if (!is.na(quo_name(gr_)))` – akrun Jun 22 '18 at 06:57
  • I've updated with sessionInfo. As far as I can see, `gr` can not be interpreted as a vector, so why should the if complain? – MrGumble Jun 22 '18 at 06:58
  • It is a quosure, so `length(quo(fruit))# [1] 2` and the reason is `as.character(quo(fruit))# [1] "~" "fruit"` I am using `dplyr_0.7.5` with `rlang_0.2.1` – akrun Jun 22 '18 at 07:00
  • `df %>% group_by(gah) %>% d_func(gr=fruit) %>% dim#[1] 3 3` or ` `d_func(df, gr=fruit) %>% dim#[1] 3 3` – akrun Jun 22 '18 at 07:02
  • I see now. So just to be clear, is `gr` the quosure, or is `gr_` the quosure? But it appears that regardless, I can only interrogate and act on `gr_`, not `gr`? – MrGumble Jun 22 '18 at 07:06
  • I meant the `gr_` is a quosure and with `gr` if you use `print(gr)` within the function, it returns the vector of 'fruit' column. So, it is getting evaluated – akrun Jun 22 '18 at 07:09
  • Let me try on a fresh session in case there is a `fruit` object – akrun Jun 22 '18 at 07:10
  • I think `fruit` is a default vector dataset that gets in the way of evaluation, but anyway the function should work once `is.na` with `gr_` is fixed with the versions correct – akrun Jun 22 '18 at 07:13
  • Can't say about a default `fruit` vector, but I am clearly enlightned. Thanks for the help. Now I just need to solve how to evaluate whether the argument is missing, NA, or not. – MrGumble Jun 22 '18 at 07:28
  • in order to me to understand - you want the function to evaluate 'gr' as 'NA' when you specify the 'gr' argument, even if it is non-existent? – tjebo Jun 25 '18 at 07:30

1 Answers1

2

A bit of a late answer, but the issue with your implementation of d_func is that you're mixing standard and non-standard evaluation of the same variable. You're using enquo to capture the expression given to gr in the calling environment (non-standard evaluation), while at the same time testing if the value held by the variable gr is NA (standard evaluation).

In case of standard evaluation (as in !is.na(gr)), gr will evaluate to the value held by the variable fruit, NOT the expression fruit. In your case, variable fruit was never defined. In akrun's case -- who likely did library(tidyverse) -- fruit evaluates to a pre-defined string vector that comes from stringr::fruit and contains various fruit names.

In either case, the behavior is not desirable. Your goal is to perform a specific action only if gr was specified. R provides a primitive function missing() that can be used for this purpose. If you replace

if (!is.na(gr)) {

with

if (!missing(gr)) {

all four of your test cases will work as expected.

Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74