4

I want to add a new column based on a given character vector. For example, in the example below, I want to add column d defined in expr:

library(magrittr)

data <- tibble::tibble(
  a = c(1, 2),
  b = c(3, 4)
)

expr <- "d = a + b"

just as below:

data %>%
  dplyr::mutate(d = a + b)

# # A tibble: 2 x 3
#       a     b     d
#   <dbl> <dbl> <dbl>
# 1     1     3     4
# 2     2     4     6

However, in the codes below, while the calculations themselves (i.e., adding) work, the names of the new columns are different from what I expected.

data %>%
  dplyr::mutate(!!rlang::parse_expr(expr))

# # A tibble: 2 x 3
#       a     b `d = a + b`
#   <dbl> <dbl>       <dbl>
# 1     1     3           4
# 2     2     4           6

data %>%
  dplyr::mutate(!!rlang::parse_quo(expr, env = rlang::global_env()))

# # A tibble: 2 x 3
#       a     b `d = a + b`
#   <dbl> <dbl>       <dbl>
# 1     1     3           4
# 2     2     4           6

data %>%
  dplyr::mutate(rlang::eval_tidy(rlang::parse_expr(expr)))

# # A tibble: 2 x 3
#       a     b `rlang::eval_tidy(rlang::parse_expr(expr))`
#   <dbl> <dbl>                                       <dbl>
# 1     1     3                                           4
# 2     2     4                                           6

How can I properly use an expression in dplyr::mutate?

My question is similar to this, but in my example, the new variable (d) and its definition (a + b) are given in a single character vector (expr).

Koopa
  • 77
  • 3

3 Answers3

3

Lets first look at what kind of expressions dplyr::mutate takes to create named variables: we need a named list that contains an expression to create variables based on that expression with the given list element name.

library(tidyverse)

data <- tibble::tibble(
  a = c(1, 2),
  b = c(3, 4)
)

expr <- "d = a + b"
# let's rewrite the string above as named list containing an expression.
expr2 <- list(d = expr(a + b))

# this works as expected:
data %>% 
  mutate(!!! expr2)

#> # A tibble: 2 x 3
#>       a     b     d
#>   <dbl> <dbl> <dbl>
#> 1     1     3     4
#> 2     2     4     6

Now we simply need a function that transforms a string into a named list containing the expression of the right-hand side of the equation. The name needs to be the left-hand side of the equation. We can do this with regular string manipulations. Finally we need to transform the right-hand side of the equation from a string into an expression. We can use base R's str2lang here.

create_expr_ls <- function(str_expr) {
  expr_nm <- str_extract(str_expr, "^\\w+")
  expr_code <- str_replace_all(str_expr, "(^\\w+\\s?=\\s?)(.*)", "\\2")
  set_names(list(str2lang(expr_code)), expr_nm)
}

expr3 <- create_expr_ls(expr)

data %>% 
  mutate(!!! expr3)

#> # A tibble: 2 x 3
#>       a     b     d
#>   <dbl> <dbl> <dbl>
#> 1     1     3     4
#> 2     2     4     6

Created on 2022-01-23 by the reprex package (v0.3.0)

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39
1

To get the desired name for the mutated column, you can still use the same syntax and assign the results to a column with the preferred name. To get this name you can use a regular expression to find what is before = and then remove any leading or trailing spaces that might exist.

expr <- "x = a * b"
col_name <- trimws(str_extract(expr,"[^=]+"))

data %>%
   dplyr::mutate(!!col_name := !!rlang::parse_expr(expr))
# A tibble: 2 × 3
      a     b     x
  <dbl> <dbl> <dbl>
1     1     3     3
2     2     4     8

data %>%
   dplyr::mutate(!!col_name := !!rlang::parse_quo(expr, env = rlang::global_env()))
# A tibble: 2 × 3
      a     b     x
  <dbl> <dbl> <dbl>
1     1     3     3
2     2     4     8
 
data %>%
   dplyr::mutate(!!col_name := rlang::eval_tidy(rlang::parse_expr(expr)))
# A tibble: 2 × 3
      a     b     x
  <dbl> <dbl> <dbl>
1     1     3     3
2     2     4     8
ekolima
  • 588
  • 4
  • 8
1

Any of these work. The second is similar to the first but does not require that rlang be on the search path. The third and fourth also work if the d= part is not present in expr in which case default names are used. The last one uses only base R and is also the shortest.

data %>% mutate(within(., !!parse_expr(expr)))

data %>% mutate(within(., !!parse(text = expr)))

data %>% mutate(data, !!parse_expr(sprintf("tibble(%s)", expr)))

data %>% { eval_tidy(parse_expr(sprintf("mutate(., %s)", expr))) }

within(data, eval(parse(text = expr)))  # base R

Note

Assume this premable:

library(dplyr)
library(rlang)

# input
data <- tibble(a = c(1, 2), b = c(3, 4))
expr <- "d = a + b"
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341