What specifically are the dangers of eval(parse(...))?

Question

There are several questions on how to avoid using eval(parse(...))

Which sparks the questions:

Why Specifically should eval(parse()) be avoided?
And most importantly, What are the dangers?
- Are there any dangerous if the code is not used in production? (I'm thinking, any danger of getting back unintended results. Clearly if you are not careful about what you are parsing, you will have issues. But is that any more dangerous than being sloppy with get()?)

I don't know if I'll have time to create a good example/write a proper answer, but my headaches in this direction have had more to do with `eval()` generally than with `eval(parse())`. The problem with `eval()` is that it tries to follow some basic but often hard-to-understand rules about the frames in which the expression is evaluated, and these can often bite you when you start using code constructs in more complex ways than the author originally thought of ... http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/ — Ben Bolker, Nov 30 '12 at 18:51
@BenBolker, this question seems to still be hitting lots of search results. If you'd like to elaborate on your comment above, I would love to hear more of your thoughts, as I'm sure others would as well — Ricardo Saporta, Oct 31 '13 at 02:07
@Ricardo Saporta, you seem to enjoy using `data.table`. Interestingly, I came across a [post](https://stackoverflow.com/a/10676138/5193830) where is explained that `eval` can be faster than `get` in some cases with `data.table` — Valentin_Ștefan, Feb 05 '18 at 17:22

score 43 · Accepted Answer · edited May 23 '17 at 11:47

Most of the arguments against eval(parse(...)) arise not because of security concerns, after all, no claims are made about R being a safe interface to expose to the Internet, but rather because such code is generally doing things that can be accomplished using less obscure methods, i.e. methods that are both quicker and more human parse-able. The R language is supposed to be high-level, so the preference of the cognoscenti (and I do not consider myself in that group) is to see code that is both compact and expressive.

So the danger is that eval(parse(..)) is a backdoor method of getting around lack of knowledge and the hope in raising that barrier is that people will improve their use of the R language. The door remains open but the hope is for more expressive use of other features. Carl Witthoft's question earlier today illustrated not knowing that the get function was available, and the question he linked to exposed a lack of understanding of how the [[ function behaved (and how $ was more limited than [[). In both cases an eval(parse(..)) solution could be constructed, but it was clunkier and less clear than the alternative.

score 38 · Answer 2 · edited Oct 01 '16 at 21:40

The security concerns only really arise if you start calling eval on strings that another user has passed to you. This is a big deal if you are creating an application that runs R in the background, but for data analysis where you are writing code to be run by yourself, then you shouldn't need to worry about the effect of eval on security.

Some other problems with eval(parse( though.

Firstly, code using eval-parse is usually much harder to debug than non-parsed code, which is problematic because debugging software is twice as difficult as writing it in the first place.

Here's a function with a mistake in it.

std <- function()
{
  mean(1to10)
}

Silly me, I've forgotten about the colon operator and created my vector wrongly. If I try and source this function, then R notices the problem and throws an error, pointing me at my mistake.

Here's the eval-parse version.

ep <- function()
{
  eval(parse(text = "mean(1to10)"))
}

This will source, because the error is inside a valid string. It is only later, when we come to run the code that the error is thrown. So by using eval-parse, we've lost the source-time error checking capability.

I also think that this second version of the function is much more difficult to read.

The other problem with eval-parse is that it is much slower than directly executed code. Compare

system.time(for(i in seq_len(1e4)) mean(1:10))
   user  system elapsed 
   0.08    0.00    0.07

and

system.time(for(i in seq_len(1e4)) eval(parse(text = "mean(1:10)")))
   user  system elapsed 
   1.54    0.14    1.69

score 19 · Answer 3 · answered Dec 02 '12 at 10:44

Usually there's a better way of 'computing on the language' than working with code-strings; evalparse heavy-code needs a lot of safe-guarding to guarantee a sensible output, in my experience.

The same task can usually be solved by working on R code as a language object directly; Hadley Wickham has a useful guide on meta-programming in R here:

The defmacro() function in the gtools library is my favourite substitute (no half-assed R pun intended) for the evalparse construct

require(gtools)

# both action_to_take & predicate will be subbed with code

F <- defmacro(predicate, action_to_take, expr = 
    if(predicate) action_to_take)

F(1 != 1, action_to_take = print('arithmetic doesnt work!'))

F(pi > 3, action_to_take = return('good!'))
[1] 'good!'

# the raw code for F
print(F)

function (predicate = stop("predicate not supplied"), action_to_take = stop("action_to_take not supplied")) 
{
    tmp <- substitute(if (predicate) action_to_take)
    eval(tmp, parent.frame())
}
<environment: 0x05ad5d3c>

The benefit of this method is that you are guaranteed to get back syntactically-legal R code. More on this useful function can be found here:

Hope that helps!

ORION · Answer 4 · 2012-11-30T17:57:06.473

In some programming languages, eval() is a function which evaluates a string as though it were an expression and returns a result; in others, it executes multiple lines of code as though they had been included instead of the line including the eval. The input to eval is not necessarily a string; in languages that support syntactic abstractions (like Lisp), eval's input will consist of abstract syntactic forms. http://en.wikipedia.org/wiki/Eval

There are all kinds of exploits that one can take advantage of if eval is used improperly.

An attacker could supply a program with the string "session.update(authenticated=True)" as data, which would update the session dictionary to set an authenticated key to be True. To remedy this, all data which will be used with eval must be escaped, or it must be run without access to potentially harmful functions. http://en.wikipedia.org/wiki/Eval

In other words, the biggest danger of eval() is the potential for code injection into your application. The use of eval() can also cause performance issues in some languages depending on what is being used for.

Specifically in R, it's probably because you can use get() in place of eval(parse()) and your results will be the same without having to resort to eval()

concrete examples of strings a *smart* user or hacker could use: `T <- FALSE; F <- TRUE`, `rm(list=ls())`, `system("rm -rf your_directories")`, `source("http://.../virus.R")`. — flodel, Nov 30 '12 at 18:01
There is a secure version of `eval` (`eval.secure`) in the RAppArmor package that executes in a sandbox, and can't exercise superuser rights unless the parent process has them. — Matthew Plourde, Nov 30 '12 at 18:28
@ORION, thanks for addressing the security vulnerabilities! and +1 to @mplourde for `eval.secure` — Ricardo Saporta, Nov 30 '12 at 23:32
I do not think the first quoted material is correct. That describe the actions that I understand to be performed by `parse()`. The (valid) input to `eval` is almost never a "string". — IRTFM, Aug 21 '14 at 23:39

What specifically are the dangers of eval(parse(...))?

4 Answers4

Linked

Related