5

I need to process some user-provided code on the server using PHP. The code is about to cover some very basic programming capabilities, for example: variables, literals, (preferably) functions, and some associated operations.

An option is to use the dangerous function of eval(). For my specific case, it's overwhelmingly & redundantly full featured, apart from its security issues and performance bottlenecks. Sanitizing the tokens using token_get_all() protects against Murphy, not Machiavelli! Regardless of its downsides, it's truly capable of what I'm tiring to achieve.

I've also checked the Symphony's ExpressionLanguage; it has some shortcomings:

  • it cannot detect the "variables" on its own (they should be introduced and known beforehand)
  • it lacks basic variable functionalities (only initializes them: no assignment functionality)
  • it's designed only for "one liner" expressions

Alas! a more sophisticated ExpressionLanguage would've sufficed.

I'm looking for something that allows some very basic "programming" capability to the users. Is there such a thing, if so, what is it? (even though it's written in another language, but is utilizable somehow on a server.) If such a thing is not around, then how should I treat the eval() to not to drawn me?! Or, as a last-resort, how may I design a such a simple programming capability? (Please elaborate on the matters :)


As per comments below, here is a list of "programming" features the code syntax needs to support. It would've sufficed if the followings were supported in addition to what the ExpressionLanguage systax provides:
  • The sequence flow: executing the instruction one after another (contrary to the "one liner" nature of the ExpressionLanguage)
  • Local variable declaration (and their detections afterwards, of course)
  • Variable assignments from expressions (any composition of literals, function invocations, operators)
  • Passing variables to functions
  • Flow control constructs: at least a conditional construct (e.g: if) and a repetition construct (e.g: for loop)
  • 1
    Run the program in a sandbox, so it can't access anything and cause damage. – Barmar May 19 '20 at 20:31
  • 1
    Virtual machines are also a solution. – Barmar May 19 '20 at 20:33
  • 1
    Docker containers, similar solutions. – Barmar May 19 '20 at 20:33
  • 1
    @Barmar Could you please elaborate. –  May 19 '20 at 20:36
  • Not really, I don't know the details of setting up these things. But surely you can find what you need from google. – Barmar May 19 '20 at 20:38
  • There's a really infamous [part of Drupal](https://www.drupal.org/docs/7/howtos/add-php-code-to-the-body-of-a-drupal-7-block) that was intentionally removed and then put into a [module](https://www.drupal.org/project/php) that does something along the lines of what you are asking. After reading through the warnings, you'll see that their solution was to only allow "competent PHP coders you trust" to use that filter. That is really the only safe solution to allowing someone to execute code on your server. Even if you built an AST parser, those things blow up all the time. – Chris Haas May 19 '20 at 21:57
  • As everyone else says, find a way to spawn your code on another machine/process/instance that you don't care if it gets hacked and taken over if you want another solution, but, and this is important, you can't trust the output of that machine either, because a malicious person might have done something crazy their, too. – Chris Haas May 19 '20 at 21:59
  • @dev2020 How complex are these PHP scripts and do they have to be PHP scripts at all? Is it possible that you can define your own subset of functions, even your own programming language, so the users actually can't do anything harmful? What are these (PHP) scripts doing and why do they need the full power of a complete programming language? And are the end-users smart enough to program? – Progman May 30 '20 at 20:09
  • 1
    @Progman Assume the end user will always be smart enough to find any existing exploit. Either write your own parser and interpreter (not suggested), or go the easy route and let the user write their own Javascript scripts which execute on the browser (no server security issues). – kmoser May 30 '20 at 20:18
  • 1
    @Progman The code does not need to be PHP at all! It just needs to be capable of the basic programming capabilities (mentioned in the post), the ultimate goal of the "code" is to modify some pre-defined variables (local disposable variables can help in the operations of course). The end users are not programmers, just simple eager people! –  May 30 '20 at 20:22
  • 1
    @kmoser The code is supposed to be run on the server somehow; thus, the browser-based solution is not applicable. –  May 30 '20 at 20:25
  • @dev2020 Why does the code have to run on the server? How would running it in the browser fail to meet your requirements? In other words, can you explain how this is not an [XY problem](http://xyproblem.info/)? – kmoser May 30 '20 at 22:58
  • 1
    @kmoser for a broader picture: The goal of the solution is to **arbitrarily** modify and store the final values of some user-defined variables (the code might have some locally defined (and finally disposable) vars and programming-alike facilities to help the process). The browser is involved only as a UI to enter the "code", the code should run in a context-free environment to modify the provided variables. The code runs against the defined variables having arbitrary values, the code may ultimately modify the variable values; the variable values will be stored to be utilized later. –  May 31 '20 at 05:31
  • @dev2020 I still don't see why you can't write such an environment in Javascript. The user-provided code could still be stored in a server-side database, but executed in the browser. If a malicious user tweaks the runtime environment, the worst that would happen is that the code would fail to run. There would be no server-side security issues. – kmoser May 31 '20 at 06:28
  • @kmoser because the "code" is supposed to be executed regardless of the browser; when it executes, there's no browser-context (think of it analogous to the SQL queries, or whatever completely independent of the browser context). –  May 31 '20 at 08:20
  • @Progman The post has been updated to include the list of programming features required (hopefully to not have missed any other critical aspects). What I'm looking after is really a basic "programming" syntax. Please kindly check it out. –  May 31 '20 at 10:51
  • Perhaps something like the [Judge0 API](https://rapidapi.com/hermanzdosilovic/api/judge0/details) would work for you. – jdaz Jun 01 '20 at 05:04
  • @dev2020, I think this really boils down to whether you want to create a new language or use an existing one. If you reuse one like PHP, you’ll have to decide what parts of the language you are going to try to block, such as classes and namespaces, and maybe even potentially unsafe things like import and mysqli_*. If you create a new language, it will be more work upfront but at least you’ll have control over things. – Chris Haas Jun 02 '20 at 14:31
  • @dev2020 _"The end users are not programmers, just simple eager people!"_ this is the point where I would go for a custom "input language". Parse the user input and handle all basic functionalities in the background (similar to MarkDown). This reduces security issues and even more important: it can handle unforeseen inputs and code errors. I had a similar project, where text input should be used to control a work flow (if / loops / vars / ...). A real scripting language was the starting point, but at the end I came up with a custom backend that handles all the functionalities. – mixable Jun 02 '20 at 15:46
  • @mixable It sounds promising on simplicity and I appreciate you to also elaborate on your approach in implementing complex expressions, variable declarations, basic functionalities, etc, in your "input language". –  Jun 02 '20 at 16:21

3 Answers3

3

Maybe can you take a look on Docker. For example you can copy user code into a file on your server (without execute it) and then run it inside a container. This will allow you to :

  • run code into a specific dedicated container which can be destroyed after script execution
  • run code using different version of PHP

Some example :

docker run -v "$PWD":/usr/src -w /usr/src --rm php:7.4.5 php ./myScript.php

This will execute file myScript.php into a new container based on php 7.4.5, once done container will be deleted.

Same thing using another PHP version :

docker run -v "$PWD":/usr/src -w /usr/src --rm php:5.6 php ./myScript.php

Usefull links :

Edit regarding performances :

Obviously running code using a container will be longer than running code directly from PHP.

For example, we can test it running the following code :

<?php

function reverseArray(array $array): array {
    for ($i = 0; $i < count($array) / 2; $i++) {
        $tmp = $array[$i];
        $array[$i] = $array[count($array) -1 - $i]; 
        $array[count($array) - 1 - $i] = $tmp;
    }
    return $array;
}   


$tabToReverse = [5, 8, 95, 10, 6, 17, 42, 20];
echo 'Reversed array : '."\n";
echo  implode(' ', reverseArray($tabToReverse));

Same code with a syntax error :

<?php

// syntax error
function reverseArray($array: array): array {
    for ($i = 0; $i < count($array) / 2; $i++) {
        $tmp = $array[$i];
        $array[$i] = $array[count($array) -1 - $i]; 
        $array[count($array) - 1 - $i] = $tmp;
    }
    return $array;
}   


$tabToReverse = [5, 8, 95, 10, 6, 17, 42, 20];
echo 'Reversed array : '."\n";
echo  implode(' ', reverseArray($tabToReverse));

The following PHP code will compare both executions (plus one with some syntax error):

<?php

/**
 * Run code using eval
 */
$start = microtime(true);
$code = str_replace('<?php', '', file_get_contents('./reverseArray.php'));
echo eval($code)."\n";
$end = microtime(true);
$duration = $end - $start;
echo "Duration using eval $duration\n"; 


/**
 * Run code using container
 */
$start = microtime(true);
$cmd = 'docker run -v "$PWD":/usr/src -w /usr/src --rm php:7.4.5 php ./reverseArray.php';
exec($cmd, $result);
echo implode("\n", $result)."\n";
$end = microtime(true);
$duration = $end - $start;
echo "Duration using container $duration\n";

/**
 * Run code using container with an error
 */
$start = microtime(true);
$cmd = 'docker run -v "$PWD":/usr/src -w /usr/src --rm php:7.4.5 php ./reverseArrayWithError.php';
exec($cmd, $resultWithError);
echo implode("\n", $resultWithError)."\n";
$end = microtime(true);
$duration = $end - $start;
echo "Duration using container $duration\n";

The result on my laptop are :

php ./runCode.php 
Reversed array : 
20 42 17 6 10 95 8 5
Duration using eval 0.00031089782714844
Reversed array :
20 42 17 6 10 95 8 5
Duration using container 0.79519391059875

Parse error: syntax error, unexpected ':', expecting ')' in /usr/src/reverseArrayWithError.php on line 3
Duration using container 0.81346988677979

As you can see running up the container took time. But the code has been executed in a specific area.

In all cases, the frontend part will be the same, it will use some Ajax query in order to POST data on the server, wait for result and display it.

Note 1 : even if code is executed in a specific container it must be sanitized before, as user input should never be trusted.

Note 2 : using this architecture require to manage running containers in order to prevent overload. What happend if 10 000 users submit there code at the same time ? But I think this is another topic.

NicoM
  • 125
  • 5
  • Could you please elaborate on the performance considerations in this manner: running the code using the docker containers and then retrieving the data back to the originator script, compared to executing the code on the flow (regardless of security issues), or even using JavaScript code instead of PHP (or any other "lightweight" scripting language) –  Jun 02 '20 at 12:20
2

As per your limited and "basic" expectations from the "code", in addition to the other approaches represented, you may "create" some sort of "assembly" to translate your predefined single-instructions (that may have arguments) into executable code. The "assembly" is just a means to establish a strong correspondence between the instructions and the equivalent actions. For example, you may define a dec instruction to declare a variable with an optional initial-value:

dec <variable> [initialValue]

Having the dec myVar 4, translates into the following executable code (expressed in PHP notation):

$myVar = 4;

Or, an add instruction with two operands to add the specified amount onto the provided variable:

add <variable> <amount>

The aforementioned instruction can be utilized as: add myVar 2, and should translate into the following:

$myVar += 2;

You might even associate advanced control structures (such as conditionals or iterations)

if <criteria>
   (consequent)
fi

This is the basic idea resulted in the diverse programming languages, which encounters you with some pretty advanced topics! Nevertheless, your specific needs enable you to impose some revealing restriction to avoid cumbersomeness! For the sake of simplicity, the "arguments" should only support single entities; either "literals" or "variables". This indeed causes verbosity and the lack of algebraic-expression, but it makes everything extremely simpler for "you" by keeping you away from the advanced topics involved in expression-evaluations!

dec needsMore false
gte needsMore myVar 5
if needsMore
   add myVar 3
fi

this should translate into:

$needsMore = false;
$needsMore = $myVar >= 5;
if ($needsMore) {
   $myVar += 3;
}

As another simplification, prohibit the re-assignment of the "guard variable" in the repetition constructs. This restriction also enables you to forbid the "infinite loops", or the expensive ones, beforehand! You may impose whatever rules that best fit your needs that will also sanitizes the user-code in a secure manner on your preferred way!

To help your non-coder users, you should also create a visual-aid to encapsulate the assembly for easing and to help them with the "instructions" and the corresponding arguments (to insert, modify or remove them); specially on the "selection" and "repetition" constructs.

someOne
  • 1,975
  • 2
  • 14
  • 20
1

One thing that comes to mind is Twig, which is the template engine used by Symfony. It has a sandbox mode that you can enable, making it safe to evaluate untrusted code. I used it before to allow users to write their own templates.

Take a look at these resources:

An example of what is possible:

So although it's meant for templating, I think you could adapt it to achieve what you want.

Good luck!

lbrandao
  • 4,144
  • 4
  • 35
  • 43