Evaluate R function stored in column that references other column names

Question

I have the following dataset:

my.df <- data.frame(my_function=rep(c("Var1+Var 2","Var 2-Var1","(Var 2-(Var 2-Var1))/Var 2"), 1),
                    `Var1`=rep(1:1,3), 
                    `Var 2`=rep(5:5,3), check.names = FALSE)

my.df
#                  my_function Var1 Var 2
# 1                 Var1+Var 2    1     5
# 2                 Var 2-Var1    1     5
# 3 (Var 2-(Var 2-Var1))/Var 2    1     5

And I want to use column named my_function to calculate the values for each row into a new column called outcome

The outcome would be: 1+5=6,5-1=4,(5-(5-1))/5=0.2 for each of the rows.

EDIT Correct answers also reference the following original dataset:

my.df <- data.frame(my_function=rep(c("1000+2000","2000-1000","(2000-(2000-1000))/2000"), 1), `1000`=rep(1:1,3), `2000`=rep(5:5,3))

This would be massively easier if your code used valid R variable names instead of numbers. — But, more importantly: please provide some more background information: why are you doing this, where is the data coming from, etc? This is important for finding the most appropriate solution in your code (especially since evaluating arbitrary code provided externally is usually a *big* no-no, for reasons of efficiency as well as safety). — Konrad Rudolph, Nov 07 '22 at 13:11
Hi Konrad - I appreciate the help. I updated the calculation. I moved from 1000 and 2000 to "Var1" and "Var 2". "Var 2" is by choice. — J. Doe., Nov 07 '22 at 13:15
It has a space because I have many different named variables in a wide format and the calculations are considerably more complex and I need it to work for more complex names. Basically the function names are fixed and I cannot change them. — J. Doe., Nov 07 '22 at 13:20
I think you need `check.names = FALSE)` in your original dataset example, to avoid X prefixes. — zx8754, Nov 07 '22 at 13:26
Just to clarify: Do you want to show the whole equaiton as a result `1+5=6` or just the actual result on the righthand side of the equation `6`? — TimTeaFan, Nov 07 '22 at 15:31

score 2 · Answer 1 · answered Nov 07 '22 at 13:35

2

Loop through my_function, then loop through column names gsub with value, finally evil parse:

vars <- colnames(my.df)[ -1 ]

sapply(seq(nrow(my.df)), function(i){
  res <- my.df[i, 1]
  for(v in vars){
    res <- gsub(v, my.df[i, v], res, fixed = TRUE)
  }
  eval(parse(text = res))
})
# [1] 6.0 4.0 0.2

Note:

fortunes::fortune("answer is parse")
# If the answer is parse() you should usually rethink the question.
#    -- Thomas Lumley
#       R-help (February 2005)

answered Nov 07 '22 at 13:35

zx8754

52,746
12
114
209

3

I imagine you meant “eval parse,” but “*evil* parse” is consistent with all the warnings against arbitrary text evaluation in this thread. – zephryl Nov 07 '22 at 13:43
@zephryl yes... – zx8754 Nov 07 '22 at 13:44
For the lolz I upvoted. I don't usually use `parse` but when I do, it is unavoidable. – J. Doe. Nov 07 '22 at 13:55

score 0 · Answer 2 · answered Nov 07 '22 at 13:18

A solution could be:

my.df <- data.frame(my_function=rep(c("1000+2000","2000-1000","(2000-(2000-1000))/2000"), 1), `1000`=rep(1:1,3), `2000`=rep(5:5,3))

my.df
#>               my_function X1000 X2000
#> 1               1000+2000     1     5
#> 2               2000-1000     1     5
#> 3 (2000-(2000-1000))/2000     1     5

my.df$my_function = gsub("1000", "X1000", my.df$my_function)
my.df$my_function = gsub("2000", "X2000", my.df$my_function)

my.df$outcome = sapply(split(my.df, 1:NROW(my.df)), function(x)
  eval(str2lang(x$my_function),x))

my.df
#>                   my_function X1000 X2000 outcome
#> 1                 X1000+X2000     1     5     6.0
#> 2                 X2000-X1000     1     5     4.0
#> 3 (X2000-(X2000-X1000))/X2000     1     5     0.2

However you should read the comments since there are security concerns about evaluating arbitrary code. See https://stackoverflow.com/a/18391779/6912817 for case.

score 0 · Answer 3 · answered Nov 07 '22 at 13:22

As expressed in the comments, I don't love parsing code from text, especially is the code text was generated through some user input. Here is, in my opinion, a safe way to evaluate these expressions:

library(tidyverse)

my.df <- data.frame(my_function=rep(c("1000+2000","2000-1000","(2000-(2000-1000))/2000"), 1), `1000`=rep(1:1,3), `2000`=rep(5:5,3))

my.df |>
  mutate(sub_function = pmap_chr(list(my_function, X1000, X2000),
                                 ~gsub(pattern = "1000", 
                                      replacement = ..2,
                                      x = ..1) |> 
                                   gsub(pattern = "2000",
                                       replacement = ..3)),
         eval = map_chr(sub_function, ~as.character(Ryacas::yac_symbol(.x))))
#>               my_function X1000 X2000 sub_function eval
#> 1               1000+2000     1     5          1+5    6
#> 2               2000-1000     1     5          5-1    4
#> 3 (2000-(2000-1000))/2000     1     5  (5-(5-1))/5  1/5

score 0 · Answer 4 · answered Nov 07 '22 at 13:39

Using rlang and purrr::pmap_dbl():

library(rlang)
library(purrr)

my.df$outcome <- pmap_dbl(
  my.df,
  \(my_function, Var1, Var2, ...) {
    eval(parse_expr(enexpr(my_function)))
  }
)

my.df

              my_function Var1 Var2 outcome
1               Var1+Var2    1    5     6.0
2               Var2-Var1    1    5     4.0
3 (Var2-(Var2-Var1))/Var2    1    5     0.2

TimTeaFan · Answer 5 · 2022-11-07T15:30:21.293

Here is another approach using bquote and deparse. Since your example data uses integers I first transform those to numeric to get rid of the L in the output.

my.df <- data.frame(
  my_function = rep(c("Var1+Var 2",
                      "Var 2-Var1",
                      "(Var 2-(Var 2-Var1))/Var 2"),
                    1),
  `Var1` = rep(1:1,3),
  `Var 2` = rep(5:5,3),
  check.names = FALSE)

library(dplyr)
library(stringr)

my.df %>% 
  mutate(across(starts_with("Var"), as.double)) %>%
  rowwise() %>% 
  mutate(outcome = str_replace_all(my_function,
                                   "(Var\\s{0,1}[0-9]+)",
                                   '.(.data[["\\1"]])') %>% 
           paste0("bquote(", ., ")") %>%
           str2lang %>%
           eval %>%
           list,
         outcome = paste0(deparse(outcome), " = ", res = eval(outcome)))

#> # A tibble: 3 x 4
#> # Rowwise: 
#>   my_function                 Var1 `Var 2` outcome              
#>   <chr>                      <dbl>   <dbl> <chr>                
#> 1 Var1+Var 2                     1       5 1 + 5 = 6            
#> 2 Var 2-Var1                     1       5 5 - 1 = 4            
#> 3 (Var 2-(Var 2-Var1))/Var 2     1       5 (5 - (5 - 1))/5 = 0.2

^{Created on 2022-11-07 by the reprex package (v2.0.1)}

Evaluate R function stored in column that references other column names

5 Answers5