2

I am trying to construct a formula using NSE so that I can easily pipe in columns. The following is my desired use case:

df %>% make_formula(col1, col2, col3)

[1] "col1 ~ col2 + col3"

I have made first this function:

varstring <- function(...) {
 as.character(match.call()[-1])
}

This works great with either single objects or multiple objects:

varstring(col)

[1] "col"

varstring(col1, col2, col3)

[1] "col1" "col2" "col3"

I create my function to create the formula next:

formula <- function(df, col, ...) {
 group <- varstring(col)
 vars <- varstring(...)

 paste(group,"~", paste(vars, collapse = " + "), sep = " ")
}

However, the function call formula(df, col, col1, col2, col3) produces [1] "group ~ ..1 + ..2 + ..3".

I understand that the formula is literally evaluating varstring(group) and varstring(...) and not actually substituting in the user supplied objects for evaluation like I would like it too. But I can not figure out how to make this work as intended.

Dylan Russell
  • 936
  • 1
  • 10
  • 29

4 Answers4

3

You can join an arbitrary number of arguments with a binary function by using reduce()

make_formula <- function(lhs, ..., op = "+") {
  lhs <- ensym(lhs)
  args <- ensyms(...)

  n <- length(args)

  if (n == 0) {
    rhs <- 1
  } else if (n == 1) {
    rhs <- args[[1]]
  } else {
    rhs <- purrr::reduce(args, function(out, new) call(op, out, new))
  }

  # Don't forget to forward the caller environment
  new_formula(lhs, rhs, env = caller_env())
}

make_formula(disp)
#> disp ~ 1

make_formula(disp, cyl)
#> disp ~ cyl

make_formula(disp, cyl, am, drat)
#> disp ~ cyl + am + drat

make_formula(disp, cyl, am, drat, op = "*")
#> disp ~ cyl * am * drat

One big advantage of working with expressions is that it's robust to little bobby tables (https://xkcd.com/327/):

# User inputs are always interpreted as symbols (variable name)
make_formula(disp, `I(file.remove('~'))`)
#> disp ~ `I(file.remove('~'))`

# With `paste()` + `parse()` user inputs are interpreted as arbitrary code
reformulate(c("foo", "I(file.remove('~'))"))
#> ~foo + I(file.remove("~"))
Lionel Henry
  • 6,652
  • 27
  • 33
  • One variant of this would be to allow arbitrary expressions by using `enexprs(...)` instead of `ensyms(...)`. Note that `enquos()` wouldn't work here because modelling functions do not support quosures. That means the `...` must all come from the caller environment, they can't be forwarded across different levels. – Lionel Henry Aug 28 '20 at 08:35
  • This is great! Thank you. I'm slowly figuring out NSE everyday. – Dylan Russell Aug 28 '20 at 17:55
  • I have added a parameter called `group = NULL` to the function and then added the following after your else statement: `if(deparse(substitute(group)) != "NULL")` `{ group <- rlang::ensym(group)` `rhs <- purrr::reduce(c(rhs, group), function(out, new) call("|", out, new)) }` This allows the option of making a formula with `|` like some packages use to facet by in making tables/graphs. Is this a safe way of accomplishing that? – Dylan Russell Aug 30 '20 at 05:40
  • The `deparse()` is not great. One thing you could do is to remove the `NULL` default and check if it was supplied with `missing()`. Or you can take it with `enexpr()` and check for `NULL`, and then check that a symbol was supplied. This allows users to unquote `NULL`, which can make composition of functions easier. – Lionel Henry Aug 30 '20 at 08:48
  • It could also be reasonable to drop the requirement of symbol inputs and allow arbitrary expressions with `enexpr()` and `enexprs()`, depending on your requirements. This could be occasionally useful for your users and might simplify your implementation. It should just be clear that all these expressions will be evaluated in a single environment because quosures are not supported. By the way, you could also allow `{{` (which normally unquotes a quosure) by using `quo_squash()` on the captured expressions. – Lionel Henry Aug 30 '20 at 08:58
  • Thanks! See my edits to my answer below, I think I've implemented what you have suggested. – Dylan Russell Aug 30 '20 at 17:35
2

I would suggest to use rlang::enquo(s) and rlang::as_name to achieve this:

library(rlang)

formula <- function(df, col, ...) {
  group <- enquo(col)
  vars <- enquos(...)

  group_str <- rlang::as_name(group)
  vars_str <- lapply(vars, rlang::as_name)
  
  paste(group_str,"~", paste(vars_str, collapse = " + "), sep = " ")
}

formula(mtcars, col, col1, col2, col3)
#> [1] "col ~ col1 + col2 + col3"
stefan
  • 90,330
  • 6
  • 25
  • 51
  • 1
    If you're going to use `as_name()` on a defused expression, better use `ensym()` than `enquo()`. The latter is a distraction (allows complex expressions and forwards the environment). – Lionel Henry Aug 28 '20 at 08:16
  • 1
    Note that pasting strings like this is brittle and potentially dangerous if strings come from a shiny app, see my answer for an example. – Lionel Henry Aug 28 '20 at 08:29
  • @LionelHenry Hi Lionel, thanks for the clarification. Sigh. Now I see that I have still a long way to go to get a better and deeper understanding of the topic. Time for having another read of Advanced R. (; Best S. – stefan Aug 28 '20 at 09:39
0

We could use reformulate

formula_fn <- function(dat, col, ...) {
           deparse(reformulate(purrr::map_chr(ensyms(...), rlang::as_string), 
                 response = rlang::as_string(ensym(col) )))
      
 }
formula_fn(mtcars, col, col1, col2, col3)
#[1] "col ~ col1 + col2 + col3"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I have taken the advice of @LionelHenry above and created the following function with some additional functionality not asked for in my initial question.

#' Create a formula
#'
#' Creates a new formula object to be used anywhere formulas are used (i.e, `glm`).
#'
#' @param ... any number of arguments to compose the formula
#' @param lhs a boolean indicating if the formula has a left hand side of the argument
#' @param op the operand acting upon the arguments of the right side of the formula.
#' @param group an argument to use as a grouping variable to facet by
#'
#' @return a formula
#'
#' @details If `lhs` is `TRUE`, the first argument provided is used as the left hand side of the formula.
#' The `group` paramenter will add `| group` to the end of the formula. This is useful for packages that support faceting by grouping variables for the purposes of tables or graphs.
#'
#' @export
#'
#' @examples
#' make_formula(var1, var2, var3)
#' make_formula(var1, var2, var3, lhs = FALSE)
#' make_formula(var1, var2, var3, lhs = FALSE, group = var4)
#'
make_formula <- function(..., lhs = TRUE, op = "+", group = NULL) {
  args <- rlang::ensyms(...)
  n <- length(args)
  group <- rlang::enexpr(group)

  if(lhs) {
    left <- args[[1]]
    if (n == 1) {
      right <- 1
    } else if (n == 2) {
      right <- args[[2]]
    } else {
      right <- purrr::reduce(args[-1], function(out, new) call(op, out, new))
    }
  } else {
    left <- NULL
    if (n == 1) {
      right <- args[[1]]
    } else {
      right <- purrr::reduce(args, function(out, new) call(op, out, new))
    }
  }

  if(!is.null(group)) {
    group <- rlang::ensym(group)
    right <- purrr::reduce(c(right, group), function(out, new) call("|", out, new))
  }

  rlang::new_formula(left, right, env = rlang::caller_env()) # Forward to the caller environment
}
Dylan Russell
  • 936
  • 1
  • 10
  • 29