1

I found this very helpful article on how to write a function accepting variable arguments using quosure and tidy dots. Here's some of the code:

my.summary <- function(df.name=df_tp1, group_var, ...) {
    group_var <- enquo(group_var)
    smry_vars <- enquos(..., .named = TRUE)

    the.mean <- purrr::map(smry_vars, function(var) {
        expr(mean(!!var, na.rm = TRUE))
    })
    names(the.mean) <- paste0("mean-", names(the.mean))

   df.name %>%
        group_by(!!group_var) %>%
        summarise(!!!the.mean)
}

The problem is I have to call the function with a long string of variables, like this:

cm_all1 <- my.summary(df_tp1_cm, group_var=net_role, so_part_value, cult_ci, cult_sn, cult_ebc, sl_t_lrn, sl_xt_lrn, nl_netops_km, so_rt, nl_netops_trust)

I would be very happy to be able to just call it with something like

so_part_value:nl_netops_trust

instead, but this gives errors like this:

Error in so_part_value:nl_netops_trust : NA/NaN argument

I also tried putting the variable names in a character vector and then using enquo() and !! but that didn't work.

I'd appreciate any ideas.

Here is my rewrite of the function using Yifu's ideas. This works for my fake data set but not the real data.

my.summary <- function(df.name=df_tp1, group_var, ...) {
##    group_var <- enquo(group_var)
    smry_vars <- df.name %>% select(...) %>% colnames()

    df.name %>%
        ##        group_by(!!group_var) %>%
        group_by({{group_var}}) %>%
        summarise_at(smry_vars,
                     list(mean=function(x) mean(x, na.rm=TRUE),
                          sd=function(x) sd(x, na.rm=TRUE),
                          min=function(x) min(x, na.rm=TRUE),
                          max=function(x) max(x, na.rm=TRUE),
                          q1=function(x) quantile(x, .25, na.rm=TRUE),
                          q2=function(x) quantile(x, .50, na.rm=TRUE),
                          q3=function(x) quantile(x, .75, na.rm=TRUE),
                          n=function(x) n()
                          ))
}
yusuzech
  • 5,896
  • 1
  • 18
  • 33
Stuart
  • 569
  • 6
  • 12
  • Yes, right. tidy dots==dot dot dot – Stuart Jul 30 '19 at 23:20
  • I know you didn't write the linked page, but that's a pretty interesting renaming. I'm pretty sure the `...` (ellipsis) has been around since the days of S in the 1980s, as a precursor to R even existing. – thelatemail Jul 30 '19 at 23:38
  • Here's a similar question I posted a while ago https://stackoverflow.com/questions/50555526/rlang-get-names-from-with-colon-shortcut-in-nse-function – camille Jul 30 '19 at 23:54
  • In this case using `summarise_at` would be quite a lot easier – alistaire Jul 31 '19 at 00:16

1 Answers1

3

You just need to make sure ... is in the correct environment(the df you provided in this example). And then you can use colnames() to extract the column name.

library(rlang)
get_column_range <- function(df,...){

    writeLines("Column names as string:")
    print(df %>% select(...) %>% colnames())
    writeLines("Convert back to symbols")
    print(syms(df %>% select(...) %>% colnames()))
}

get_column_range(df = iris,Sepal.Length:Petal.Width)
Column names as string:
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
Convert to symbol
[[1]]
Sepal.Length

[[2]]
Sepal.Width

[[3]]
Petal.Length

[[4]]
Petal.Width

And dplyr functions with _at suffix also accept string as variable, you don't have to convert them to quosure and then unquote them.

Note that {{}} is a easier syntax to learn, it quotes and unquotes at the same time:

my.summary <- function(df,group_var,...){
    column_names <- df %>% select(...) %>% colnames()

    df %>%
        group_by({{group_var}}) %>%
        summarise_at(column_names,list(mean = mean))
}

my.summary(df = iris,group_var = Species,Sepal.Length:Petal.Width)
# A tibble: 3 x 5
  Species    Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
  <fct>                  <dbl>            <dbl>             <dbl>            <dbl>
1 setosa                  5.01             3.43              1.46            0.246
2 versicolor              5.94             2.77              4.26            1.33 
3 virginica               6.59             2.97              5.55            2.03 

For more, you can read at: https://rlang.r-lib.org/reference/quotation.html

yusuzech
  • 5,896
  • 1
  • 18
  • 33
  • This is great. It works fine for my fake data frame: my.summary(df, grp, x:z) but when I run it with my real data frame: cm_all1 <- my.summary(df_tp1_cm, group_var=net_role, so_part_value:nl_netops_trust) I get this message: Error in is_string(x) : object 'so_part_value' not found (Of course, that column, so_part_value, is in the data frame. Can't figure it out. – Stuart Jul 31 '19 at 20:49
  • Using the first example function `get_column_range()`, Are you able to print the column names? – yusuzech Jul 31 '19 at 20:58
  • If that's the case, then the columns should be correctly selected. My solution actually uses a different approach than the post you mentioned. I'm using `summarise_at()` which accepts character vector as input. If you are not using `summarise_at()`, you need to convert the column names to symbol using `syms(column_names)` or to quosure using `quos(!!!syms(column_names))`. – yusuzech Jul 31 '19 at 23:12
  • I am using summarise_at(), just as you wrote in your explanation. The weird thing is that it works for my fake data frame, but not for the real data. – Stuart Jul 31 '19 at 23:51
  • Then that's really strange. Do you have any codes in the function that modifies the column names? Because the error message means the column name doesn't exist. – yusuzech Jul 31 '19 at 23:55
  • I found a typo in your function, I changed `df` to `df.name` and the function works in iris and diamonds data set. I'm not able to replicate your error. – yusuzech Aug 01 '19 at 15:29
  • OK, that fixed it. I'm a dummy. Thanks very much, Yifu. – Stuart Aug 01 '19 at 16:24