0

Consider the dummy example below: I want to run a model on a range of subsets of the data.table in a loop, and want to specify the exact line to iterate as a string (with an iterator i)

library(data.table)

DT <- data.table(X = runif(100), Y = runif(100))

f1 <- function(code) {
  
  for (i in c(20,30,50)) {
    
    eval(parse(text = code))
    
  }
  
}

f1("lm(X ~ Y, data = DT[sample(.N, i)])")

Obviously this doesn't return any output as lm() is merely evaluated in the background 3 times. The actual use case is more convoluted, but this is meant to be a theoretical simplification of it.

The example above, nonetheless, works fine. The problems begin when the function f1 is included in the package, instead of being defined in the global environment. If I'm not mistaken, in this case f1 is defined in the package's base env. Then, calling f1 from global env gives the error: Error in [.data.frame(x, i) : undefined columns selected. R can correctly access iterator i in its base env and DT in the global env, but cannot access the column by name inside data.table's square brackets.

I tried experimenting by setting envir and enclos arguments to eval() to baseenv(), globalenv(), parent.frame(), but haven't managed to find a combination that works.

For example, setting envir = globalenv() seems to result in accessing DT and i, but not X and Y from the DT inside lm(). Setting envir = baseenv() we lose the global env and cannot access DT (envir = baseenv(), enclos = globalenv() doesn't change it). Using envir = list(baseenv(), globalenv()) results in not being able to access anything inside data.table's square brackets, I think, error message: "Error in [.data.frame(x, i) : undefined columns selected".

Mihail
  • 761
  • 5
  • 22
  • Your code is applied in the `for` loop. You may need to initialize an object to store the output and return the value – akrun Feb 14 '23 at 19:18

1 Answers1

1

The problem is that variables are resolved lexicographically. You could try passing in the expression and the substituting the value of i specifically before evaluating. This would take care of eliminating the need for explicit parsing.

f1 <- function(code) {
  code <- substitute(code)
  
  for (i in c(20,30,50)) {
    cmd <- do.call("substitute", list(code, list(i=i)))
    print(cmd)
    result <- eval.parent(cmd)
    print(result)
  }
}

f1(lm(X ~ Y, data = DT[sample(.N, i)]))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks @Mrflick , this fixed the eval issue, but created or revealed another issue elsewhere. Inside `f1` further down I have `timings <- data.table(...)` which creates a data.table, and works fine. (`f1` includes a call `library(data.table)`). But then the operation `timings[, t := ...]` fails due to `Check that is.data.table(DT) == TRUE. Otherwise, := and \`:=\`(...) are defined for use in j, once only and in particular ways.` data.table is a bit unusual in that it not only defines functions, but also own syntax and operators inside sq brackets. `timings` is used only internally inside `f1`. – Mihail Feb 15 '23 at 10:31
  • [this](https://stackoverflow.com/a/10529888/2753688) fixed the above. – Mihail Feb 15 '23 at 10:48
  • I take it back, the eval issue isn't fixed. What I am really trying to do is measure evaluation times by: `start <- Sys.time(); eval(...); t <- Sys.time() - start`. In my original version it wouldn't even evaluate if `f1` is defined inside a package. In your version it seems to "run" without error, but evidently doen't evaluate, since running times for each call are measured in fractions of a second, instead of several sec/min, as expected. – Mihail Feb 15 '23 at 10:56
  • Final upd: ironically, the ultimate fix turned out to be to go back to my original version, `eval(parse(text = code))` which started working perfectly fine after I've made the package [data.table-aware](https://stackoverflow.com/a/10529888/2753688). – Mihail Feb 15 '23 at 11:24