9

I want to apply different functions to the same column in a tibble. These functions are stored in a character string. I used to do this with mutate_ and the .dots argument like this:

library(dplyr)

myfuns <- c(f1 = "a^2", f2 = "exp(a)", f3 = "sqrt(a)")
tibble(a = 1:3) %>% 
  mutate_(.dots = myfuns)

This approach still works fine but mutate_ is deprecated. I tried to achieve the same result with mutate and the rlang package but did not get very far.

In my real example myfuns contains about 200 functions so typing them one by one is not an option.

Thanks in advance.

Cettt
  • 11,460
  • 7
  • 35
  • 58
  • 2
    “These functions are stored in a character string.” — This is fundamentally a bad idea. Where do these functions originate? If in code, store them as unevaluated expressions or actual functions. Don’t (ab)use strings to represent code. – Konrad Rudolph Jul 08 '19 at 12:48
  • Hm, it is a long story. Basically each component is a lengthy formula which can "spelled" out with a certain formula. That's why they are stored in character strings. I did not (and still don't) know any other way to circumvent this. – Cettt Jul 08 '19 at 12:50
  • If it can be spelled as a string then surely it can also be spelled as code? Just remove the surrounding `"…"` and either use a formula (`~ …`) or a *function* (`function (x) …`). – Konrad Rudolph Jul 08 '19 at 13:13
  • thank you Konrad, I will try that. – Cettt Jul 08 '19 at 13:14
  • 1
    So I just realised that AntoniosK’s answer essentially suggests the same Except you can simplify the code slightly because there’s no reason to wrap a pure function call into a formula: `myfuns = c(f1 = ~ . ^ 2, f2 = exp, f3 = sqrt)`. To do the same with multiple variables you’ll need to use quosures/rlang though. – Konrad Rudolph Jul 08 '19 at 13:16

6 Answers6

7

For simple equations that take a single input, it’s sufficient to supply the function itself, e.g.

iris %>% mutate_at(vars(-Species), sqrt)

Or, when using an equation rather than a simple function, via a formula:

iris %>% mutate_at(vars(-Species), ~ . ^ 2)

When using equations that access more than a single variable, you need to use rlang quosures instead:

area = quo(Sepal.Length * Sepal.Width)
iris %>% mutate(Sepal.Area = !! area)

Here, quo creates a “quosure” — i.e. a quoted representation of your equation, same as your use of strings, except, unlike strings, this one is properly scoped, is directly usable by dplyr, and is conceptually cleaner: It is like any other R expression, except not yet evaluated. The difference is as follows:

  • 1 + 2 is an expression with value 3.
  • quo(1 + 2) is an unevaluated expression with value 1 + 2 that evaluates to 3, but it needs to be explicitly evaluated. So how do we evaluated an unevaluated expression? Well …:

Then !! (pronounced “bang bang”) unquotes the previously-quoted expression, i.e. evaluates it — inside the context of mutate. This is important, because Sepal.Length and Sepal.Width are only known inside the mutate call, not outside of it.


In all the cases above, the expressions can be inside a list, too. The only difference is that for lists you need to use !!! instead of !!:

funs = list(
    Sepal.Area = quo(Sepal.Length * Sepal.Width),
    Sepal.Ratio = quo(Sepal.Length / Sepal.Width)
)

iris %>% mutate(!!! funs)

The !!! operation is known as “unquote-splice”. The idea is that it “splices” the list elements of its arguments into the parent call. That is, it seems to modify the call as if it contained the list elements verbatim as arguments (this only works in functions, such as mutate, that support it, though).

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thank you, but how to work with expressions inside a list? Setting `area2 = list(f1 = quo(Sepal.Length * Sepal.Width), f2 = quo(Sepal.Length + Sepal.Width))` does not work and using `mutate(!! area2)` does not work – Cettt Jul 08 '19 at 14:00
  • @Cettt Apologies, I should have added the code for that. See modified answer. – Konrad Rudolph Jul 08 '19 at 15:02
6

Convert your strings to expressions

myexprs <- purrr::map( myfuns, rlang::parse_expr )

then pass those expressions to regular mutate using quasiquotation:

tibble(a = 1:3) %>% mutate( !!!myexprs )
# # A tibble: 3 x 4
#       a    f1    f2    f3
#   <int> <dbl> <dbl> <dbl>
# 1     1     1  2.72  1   
# 2     2     4  7.39  1.41
# 3     3     9 20.1   1.73

Note that this will also work with strings / expressions involving multiple columns.

Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • You can use rlang::parse_exprs (plural of parse_expr) instead of using purrr::map. Even inline: mutate(!!! parse_exprs(myfuns)) – zeehio Jul 08 '19 at 18:37
  • 2
    @zeehio `parse_exprs` does not preserve the expression names (`f1`, `f2`, `f3` in this case), which become column names for the results. I opened a [GitHub issue](https://github.com/r-lib/rlang/issues/808) about it earlier today. – Artem Sokolov Jul 08 '19 at 19:21
  • Apologies for the misleading comment, I did not consider that issue :-S And thanks for the github issue! – zeehio Jul 09 '19 at 07:38
4

You have only one column, so both approaches below will give you the same result.

You only have to modify your functions' list.

library(dplyr)

myfuns <- c(f1 = ~.^2, f2 = ~exp(.), f3 = ~sqrt(.))

tibble(a = 1:3) %>% mutate_at(vars(a), myfuns)

tibble(a = 1:3) %>% mutate_all(myfuns)


# # A tibble: 3 x 4
#       a    f1    f2    f3
#   <int> <dbl> <dbl> <dbl>
# 1     1     1  2.72  1   
# 2     2     4  7.39  1.41
# 3     3     9 20.1   1.73
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • Great, this works perfectly if all "mutations" concern the same variable, as was the case in my question. However, I am more interested in a solution where multiple variables can be mutated with possible different functions. – Cettt Jul 08 '19 at 12:43
  • You mean that each variable will be mutated using its own specific set of functions? – AntoniosK Jul 08 '19 at 12:50
  • Yes something like `myfuns = c(f1 = "a^2", f2 = "a+b", f3 = "sqrt(b)")`. – Cettt Jul 08 '19 at 12:51
  • 1
    This is different because each one of those functions could use any column, or combinations of your columns. I think it's better if you post another question with a simple example. Is that something you used to do with `mutate_(.dots = …)` ? – AntoniosK Jul 08 '19 at 13:01
  • Yeah I know, I did not realize that it would make a big difference because I got my mind set on `rlang` based solutions. I will wait a bit longer and ask another question if necessary. `mutate_(.dots = ...)` could handle multiple variables with multiple functions. – Cettt Jul 08 '19 at 13:03
4

A base alternative :

myfuns <- c(f1 = "a^2", f2 = "exp(a)", f3 = "sqrt(a)")
df <- data.frame(a = 1:3)
df[names(myfuns)] <- lapply(myfuns , function(x) eval(parse(text= x), envir = df))
df
#>   a f1        f2       f3
#> 1 1  1  2.718282 1.000000
#> 2 2  4  7.389056 1.414214
#> 3 3  9 20.085537 1.732051

Created on 2019-07-08 by the reprex package (v0.3.0)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
1

One way using parse_expr from rlang

library(tidyverse)
library(rlang)

tibble(a = 1:3) %>% 
   mutate(ans =  map(myfuns, ~eval(parse_expr(.)))) %>%
   #OR mutate(ans =  map(myfuns, ~eval(parse(text  = .)))) %>%
   unnest() %>%
   group_by(a) %>%
   mutate(temp = row_number()) %>%
   spread(a, ans) %>%
   select(-temp) %>%
   rename_all(~names(myfuns))

# A tibble: 3 x 3
#    f1    f2    f3
#  <dbl> <dbl> <dbl>
#1     1  2.72  1   
#2     4  7.39  1.41
#3     9  20.1  1.73
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

you can try also a purrr approach

# define the functions
f1 <- function(a) a^2
f2 <- function(a, b) a + b
f3 <- function(b) sqrt(b)

# put all functions in one list
tibble(funs=list(f1, f2, f3)) %>%
  # give each function a name 
  mutate(fun_id=paste0("f", row_number())) %>% 
  # add to each row/function the matching column profile
  # first extract the column names you specified in each function 
  #mutate(columns=funs %>% 
  #         toString() %>% 
  #         str_extract_all(., "function \\(.*?\\)", simplify = T) %>% 
  #         str_extract_all(., "(?<=\\().+?(?=\\))", simplify = T) %>%
  #         gsub(" ", "", .) %>% 
  #         str_split(., ",")) %>%
  # with the help of Konrad we can use fn_fmls_names
  mutate(columns=map(funs, ~ rlang::fn_fmls_names(.)))  %>% 
  # select the columns and add to our tibble/data.frame  
  mutate(params=map(columns, ~select(df, .))) %>% 
  # invoke the functions
  mutate(results = invoke_map(.f = funs, .x = params)) %>% 
  # transform  to desired output
  unnest(results) %>% 
  group_by(fun_id) %>% 
  mutate(n=row_number()) %>% 
  spread(fun_id, results) %>% 
  left_join(mutate(df, n=row_number()), .) %>% 
  select(-n)
Joining, by = "n"
# A tibble: 5 x 5
      a     b    f1    f2    f3
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     2     1     4     3  1   
2     4     1    16     5  1   
3     5     2    25     7  1.41
4     7     2    49     9  1.41
5     8     2    64    10  1.41

some data

df <- data_frame(
  a = c(2, 4, 5, 7, 8),
  b = c(1, 1, 2, 2, 2))
Roman
  • 17,008
  • 3
  • 36
  • 49
  • I’m sorry to be blunt but this is *terrible* code, and that’s exactly the reason why strings shouldn’t be used to work with R expressions. The detour is conceptually complex and this complexity materialises in your code. There is *no reason* to go the detour via strings. – Konrad Rudolph Jul 08 '19 at 15:05
  • @KonradRudolph no problem. I know that it is not an elegant solution. Do you know a better way to extract the function terms? – Roman Jul 08 '19 at 15:10
  • See my answer. If you have simple functions you can use them directly (`… %>% mutate_at(vars(-Species), list(f1, f2, f3))`. – Konrad Rudolph Jul 08 '19 at 15:16
  • @KonradRudolph But this is not possible with the example data and functions: `df %>% mutate_at(vars(a, b), list(f1, f2, f3))`, right? See the answer of Artem Sokolov. This is perfect. I only knew `invoke_map` and tried to find a solution with this function, sorry for looking terrible. You are free to add the needed column names by yourself beforehand ;) – Roman Jul 08 '19 at 15:22
  • It’s possible with `f1` and `f3` but not `f2`, which has two arguments, and `mutate` will invoke it with only one argument. The `!!!` solution will work (only) if the relevant variable names exist in the data.frame, and only on quosures. To transform a list of functions to a list of quosures you need the slightly unsightly `map(list(f1, f2, f3), ~ as_quosure(fn_body(.), NULL))`. I don’t understand why there isn’t a direct function to do this conversion, but I may simply be overlooking it. – Konrad Rudolph Jul 08 '19 at 15:37
  • @KonradRudolph thanks. edited my answer and used `rlang::fn_fmls_names(.)` to get the names of the function arguments. – Roman Jul 08 '19 at 15:50
  • @Konrad, I don't get when Artem's solution wouldn't work (assuming it's what you called "the `!!!` solution"), you can define `a <- 1:3` outside of the data.frame and replace the `a` column by `b` and it works perfectmy fine – moodymudskipper Jul 08 '19 at 16:34
  • @Moody_Mudskipper Artem’s solution works because he isn’t using functions, he’s using unevaluated expressions. – Konrad Rudolph Jul 08 '19 at 17:15
  • 2
    @KonradRudolph You're right. Even though the OP uses the term "function", I sort of assumed that `myfuns` was actually a set of expressions defined in the context of some given data frame, NOT proper functions with their own scope. For the record, I don't condone writing code with strings either, but the question of "how do I use code text inside dplyr verbs?" seems to come up from time to time. – Artem Sokolov Jul 08 '19 at 18:02