1

I want to write a function, which will take both symbolic names of column and names passed as a variable (string).

Let me show you an example:

The data:

> ( d <- data.frame(A=1:3, B=3:1) )
  A B
1 1 3
2 2 2
3 3 1

Now my function:

fn <- function(data, cols) {
  return(data %>% mutate(across({{cols}}, ~. * 2)))
}

It works well for:

A) symbolic names

> d %>% fn(cols = A)
  A B
1 2 3
2 4 2
3 6 1

> d %>% fn(cols = B)
  A B
1 1 6
2 2 4
3 3 2

> d %>% fn(cols = c(A, B))
  A B
1 2 6
2 4 4
3 6 2

B) names passed as strings

> column <- "A"
> d %>% fn(cols = column)
  A B
1 2 3
2 4 2
3 6 1

> d %>% fn(cols = c("A", "B"))
  A B
1 2 6
2 4 4
3 6 2

So far, so good!

Now when I provide an external vector > 1 column, it throws a warning.

> d %>% fn(cols = columns)
Note: Using an external vector in selections is ambiguous.
i Use `all_of(columns)` instead of `columns` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
  A B
1 2 6
2 4 4
3 6 2

So I added the all_of function, which works well for strings:

fn <- function(data, cols) {
  return(data %>% mutate(across(all_of({{cols}}), ~. * 2)))
}

> d %>% fn(cols = columns)
  A B
1 2 6
2 4 4
3 6 2

but throws an error when I pass the symbolic name:

> d %>% fn(cols = A)

 Error: Problem with `mutate()` input `..1`.
x object 'A' not found
i Input `..1` is `across(all_of(A), ~. * 2)`.
Run `rlang::last_error()` to see where the error occurred. > d %>% fn(cols = B)

> d %>% fn(cols = c(A, B))

 Error: Problem with `mutate()` input `..1`.
x object 'A' not found
i Input `..1` is `across(all_of(c(A, B)), ~. * 2)`.
Run `rlang::last_error()` to see where the error occurred. 

How to fix this, to enable both approaches and avoid the warning?

Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
Bastian
  • 313
  • 1
  • 13
  • So you want the function to work for both symbols as well as strings? – Ronak Shah Dec 21 '20 at 06:16
  • Yes, exactly. And it works well, as long as the names of columns are provided explicitly: fn(A, B), or fn("A", "B"). When I provide an external vector, it prints a note about ambiguity in the selection. In the future this note will be turned into warning, and then - into an error. Theoretically I could provide 2 functions, like fn() for NSE and fn_() for SE, but I'd really like to avoid that. Maybe some conditional checking on the parameters? – Bastian Dec 21 '20 at 06:23

1 Answers1

2

My suggestion would be to keep your original implementation and the warning that comes with it, because the situation really is ambiguous. Consider:

d <- data.frame(A=1:3, B=3:1, columns=4:6)  # Note the new column named columns
columns <- c("A","B")
d %>% fn(cols = columns)                    # Which `columns` should this use?

The users of your function can then resolve the ambiguity by using all_of() themselves, and you can document so in the function's help page.

d %>% fn(cols = all_of(columns))     # works without a warning

EDIT: While I recommend the above approach, another way is to check for the existence of the variable in the calling environment. If the variable exists, assume that it contains column names and use it in all_of(); otherwise, assume that the column names are provided as is:

fn <- function(data, cols) {
  varExist <- rlang::enexpr(cols) %>% 
    rlang::expr_deparse() %>%
    exists(envir=rlang::caller_env())
  
  if(varExist)
    data %>% mutate( across(all_of(cols), ~. *2) )
  else
    data %>% mutate( across({{cols}}, ~. * 2) )
}

rm(A)              # Ensure there is no variable called A
d %>% fn(cols=A)   # Mutate will operate on column A only

A <- c("A","B")    # A now contains column names
d %>% fn(cols=A)   # Mutate will operate on A and B
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • Ah, got it, I put it in a wrong place! Unfortunately, this will be used mostly in a dynamic script, where the name of columns will be received from a web service, but this function will be also available to end users, who aren't familiar with dplyr and NSE. They will "automatically" provide a vector of strings. I cannot count on them reading the manual (in theory I should, in practice - I'd got killed by the manager). This name "columns" is very unlikely to happen in a data frame in the production environment I work with. I have to find another way... Or resign from dplyr for this task. – Bastian Dec 21 '20 at 06:07
  • All the more so as this is now a note, but in the future it will throw just error, which will break the entire code. – Bastian Dec 21 '20 at 06:10
  • @Bastian It sounds like your function will be primarily used with standard evaluation in production. My suggestion would be to commit to that and drop NSE support. However, please see my edit, if you absolutely must support both. – Artem Sokolov Dec 21 '20 at 16:20
  • 1
    That's beautiful work. rlang is so powerful, I have to explore it finally, as it opens incredible possibilities with R. Thank you very much, Artem. This absolutely fits my needs. – Bastian Dec 21 '20 at 19:53