Safely evaluating arithmetic expressions in R?

Question

Edit

Ok, since there seems to be a lot of confusion, I'm going to simplify the question a little. You can try to answer the original question below, or you can tackle this version instead and ignore everything below the line.

My goal is to take an arbitrary expression and evaluate it in an extremely restricted environment. This environment will contain only variables with the following types of values:

Numeric vectors
Pure functions that take one or more numeric vectors and return numeric vectors (i.e. arithmetic operators)

In addition, the expression would necessarily be able to use any literals, such as numeric and string constants (but not numeric or string vectors, since those would require c). I would like to evaluate the expression in this environment and ensure that there is no way for the expression to access anything outside the environment, so that I can be sure that evaluating the expression would not be a security risk. So, in the below code, can you fill in the blank with a string that will do something naughty when evaluated? "Something naughty" is defined as printing something to the screen, accessing the value of the variable secret, executing any shell command (preferably one that produces output), or anything else that seems naughty to you (justify your choice).

a <- 1
b <- 2
x <- 5
y <- 1:10
z <- -1

## Give secret a random value so that you can't just compute it from
## the above variables
secret <- rnorm(5)

allowed.variables <- c(
    ## Numeric variables
    "a", "b", "x", "y", "z",
    ## Arithmetic operators
    "(", "+", "-", "/", "*", "^", "sqrt", "log", "log10", "log2", "exp", "log1p")

restricted.environment <- Map(get, allowed.variables)

## Example naughty expressions that my method successfully guards
## against
expr1 <- "secret"
expr2 <- "cat('Printing something with cat\n')"
expr3 <- "system('echo Printing something via shell command')"

arbitrary.expression <- "?????????" # Your naughty string constant here

eval(parse(text=arbitrary.expression), envir=restricted.environment, enclos=emptyenv())

Original question

I am writing some code to take an arithmetic expression as user input and evaluate it. I have a specified set of variables that can be used, and a whitelist of arithmetic functions (+, -, *, /, ^, etc.). Is there any way that I can evaluate an expression so that only these variables and operators are in scope, in order to avoid any possibility of arbitrary code injection? I have something that I think works, but I don't want to actually use it unless I have some certainty that it is really bulletproof:

## Shortcut for parse-then-eval pattern
evalparse <- function(expr, ...) eval(parse(text=expr), ...)

# I control these
arithmetic.operators <- Map(get, c("(", "+", "-", "/", "*", "^", "sqrt", "log", "log10", "log2", "exp", "log1p"))
vars <- list(a=1, b=2)
safe.envir <- c(vars, arithmetic.operators)

# Assume that these expressions are user input, e.g. from a web form.
nice.expr <- "a + b"
naughty.expr <- paste("cat('ARBITRARY R CODE INJECTION\n'); system('echo ARBITRARY SHELL COMMAND INJECTION');", nice.expr)

## NOT SAFE! Lookups outside env still possible.
evalparse(nice.expr, envir=safe.envir)
evalparse(naughty.expr, envir=safe.envir)

## Is this safe?
evalparse(nice.expr, envir=safe.envir, enclos=emptyenv())
evalparse(naughty.expr, envir=safe.envir, enclos=emptyenv())

If you run the above code in R, you'll see that the first time we eval naughty.expr, it successfully executes its payload. However, the second time, with enclose=emptyenv(), the evaluation only has access to the variables a, b, and the specified arithmetic operators, so the payload fails to execute.

So, is this method (i.e. eval(..., envir=safeenv, enclos=emptyenv()) ) actually OK to use in production accepting actual user input, or am I missing some sneaky way to still execute arbitrary code in the resticted environment?

It's not clear what is actually user input. Does the user assign values to `a` and `b`? Do they have control over the expression? — Joshua Ulrich, Aug 22 '13 at 01:21
`naughty.expr <- "(a+b)*(b+b+a)*(b^b^b+b+a)*((b^(a+b)*(b*b+a)*(b*b*b-a))+a)"` — flodel, Aug 22 '13 at 01:56
Have a look at how http://hackme.rapporter.net/ implements a sandbox . — mnel, Aug 22 '13 at 02:02
@mnel: except several people were able to break that. Last I checked, my message was still in the file you're not supposed to be able to write to. — Joshua Ulrich, Aug 22 '13 at 02:11
Sorry for the confusion, the expression to be evaluated is the user input. In other words, assume that `naughty.expr` could be any R code, and I only want to evaluate it if it is an arithmetic expression (only using the functions I've specified. I've added a comment to the example code to clarify. — Ryan C. Thompson, Aug 22 '13 at 02:22
@RyanThompson: can you give an example? Your current example only works if the expression uses `a` and/or `b`. FWIW, I'm not confident this is secure (or can be made secure). — Joshua Ulrich, Aug 22 '13 at 02:26
That's the idea. I have a fixed set of variables whose values will all be numeric and a fixed set of operations allowed on those variables, and I want to take a user-provided arithmetic expression in terms of those variables and calculate the result and return it to the user. You can change the example to put any number of numeric vectors in `vars`. — Ryan C. Thompson, Aug 22 '13 at 02:40
Looking at sandboxR, it looks like that takes essentially the same approach only using a blacklist instead of a whitelist. — Ryan C. Thompson, Aug 22 '13 at 02:44
Your `mget` line fails for me with `argument "envir" is missing, with no default` (R 2.15.1, so maybe it needs 3.x?) — Spacedman, Aug 22 '13 at 06:41
Yes, I'm using 3.0.1, where `mget` has a default for the `envir` argument. I've edited my code to use `get` instead. — Ryan C. Thompson, Aug 22 '13 at 11:11
How do you stop the user from doing a function overload? How do you stop the user from defining his own operator, using terms in your whitelist? For example, take a look at the `sos` package code. It's easy to mod that so a user could have an operator "?+" and have that operator do "something bad." I'm no code jock, but I bet similar tricks could be done with whatever's in your whitelist. — Carl Witthoft, Aug 22 '13 at 11:42
The only user input is the expression. I control the available variables, their values, and the function whitelist, including the definitions of those functions (while will just be the default definitions). The user can't override anything because `<-` and other functions that can do assignments are not on the whitelist. — Ryan C. Thompson, Aug 22 '13 at 11:48
How about a radically different approach: create a TK or other GUI panel with drop-down lists. That way the user can't create anything that's not in your "whitelist" in the first place. Except for little Bobby Tables http://xkcd.com/327/ , who can by pass anything. — Carl Witthoft, Aug 22 '13 at 13:22
How would I use drop-down lists to allow the user to specify an arbitrary arithmetic expression? — Ryan C. Thompson, Aug 22 '13 at 18:16
I'm sorry, I don't understand what you mean by "dynamic assignment" in this context? What does dynamic assignment have to do with GUI drop-down lists? — Ryan C. Thompson, Aug 22 '13 at 20:39

score 15 · Accepted Answer · answered Aug 22 '13 at 22:24

15

I'd take a slightly different approach to defining the safe functions and the environment in which you evaluate arbitrary code, but it's really just some style changes. This technique is provably safe, provided all of the functions in safe_f are safe, i.e. they don't allow you to perform arbitrary code execution. I'd be pretty confident the functions in list are safe, but you'd need to inspect the individual source code to be sure.

safe_f <- c(
  getGroupMembers("Math"),
  getGroupMembers("Arith"),
  getGroupMembers("Compare"),
  "<-", "{", "("
)

safe_env <- new.env(parent = emptyenv())

for (f in safe_f) {
  safe_env[[f]] <- get(f, "package:base")
}

safe_eval <- function(x) {
  eval(substitute(x), env = safe_env)
}

# Can't access variables outside of that environment
a <- 1
safe_eval(a)    

# But you can create in that environment
safe_eval(a <- 2)
# And retrieve later
safe_eval(a)
# a in the global environment is not affected
a

# You can't access dangerous functions
safe_eval(cat("Hi!"))

# And because function isn't included in the safe list
# you can't even create functions
safe_eval({
  log <- function() {
    stop("Danger!")
  }
  log()
})

This is a much simpler problem than the rapporter sandbox because you're not trying to create an useful R environment, just a useful calculator environment, and the set of functions to check is much much smaller.

answered Aug 22 '13 at 22:24

hadley

102,019
32
183
245

1

Thanks. That's what I was hoping to hear. I was worried there might be some sort of special technique that I didn't know about for accessing values in arbitrary environments even if they're not on the current search path. – Ryan C. Thompson Aug 22 '13 at 22:38
@RyanThompson there are lots of techniques, but they all rely on functions that you're not allowing. – hadley Aug 22 '13 at 23:19
Yes, but I was worried that there might be some obscure syntax element that did not require any functions but was simply part of the language. But I guess it's really true that *every* element of R's syntax is ultimately just syntactic sugar for a function call. – Ryan C. Thompson Aug 23 '13 at 01:10
this is very cool. I need to understand how to filter a dataframe within the safe_env. Even if I include '[' in the list of allowed functions and place a dataframe 'df' into the safe environment, things like: safe_eval( df[1,1] ) don't seem to work... the goal is to let user input (mostly "Compare" functions) guide the filtering of a dataframe. – Nathan Siemers Mar 27 '18 at 22:29
explicitly adding '[.data.frame' to the list of allowed functions solved this. – Nathan Siemers Mar 27 '18 at 22:45

Safely evaluating arithmetic expressions in R?

Edit

Original question

1 Answers1

Linked