5

As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.

For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.

Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?

Zword
  • 6,605
  • 3
  • 27
  • 52
Stefan van der Walt
  • 7,165
  • 1
  • 32
  • 41

4 Answers4

5

Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:

http://docs.docker.io/en/latest/examples/python_web_app/

You could try running the application on Digital Ocean, like so:

https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started

nfaggian
  • 146
  • 4
  • 1
    As far as I know, Docker makes an update to the image each time it is instantiated. Since I might be launching thousands of these things, that might be problematic. What is your experience? – Stefan van der Walt Feb 08 '14 at 08:41
  • No direct experience, people have been mentioning docker frequently, these days. Sage cell is interesting, have you considered something *really* simple like a flask application? – nfaggian Feb 09 '14 at 07:37
  • Writing the flash app is not a problem, but getting the subprocesses to be sandboxed is the main challenge. Yesterday, someone told me that docker is capable of throwing away any changes after execution, returning the image to its virgin state, so perhaps I'll investigate that option some more. – Stefan van der Walt Feb 09 '14 at 10:59
  • After some more reading, I think this can be done with a small web-app + docker. – Stefan van der Walt Feb 12 '14 at 14:23
4

[disclaimer: I'm an engineer at Continuum working on Wakari]

Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.

However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:

http://sagecell.sagemath.org/?q=ejwwif

http://sagecell.sagemath.org

http://www.sagemath.org/eval.html

For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".

IanSR
  • 9,898
  • 4
  • 14
  • 15
2

I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D

0

I'm not sure if you really have to go as far as setting up LXC containers:

There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.

Another option would be to use PyPy, which has explicit support for sandboxing out of the box.

In any case, do not use pysandbox, it is broken by design and has severe security risks.

jbaiter
  • 6,913
  • 4
  • 30
  • 40
  • Thanks for these pointers. Do you know if PyPy would be able to run the scientific toolstack, including numpy, scipy and scikit-image? Also, since seccomp-nurse does not support dlopen, I doubt it can run Cython extensions used in most of these packages. – Stefan van der Walt Jan 14 '14 at 11:18
  • Pypy does not support numpy, take a look here: http://buildbot.pypy.org/numpy-status/latest.html. For the scientific stack I recommend using anaconda (https://store.continuum.io/cshop/anaconda) is a free distribution. So for every request you could create a conda virtual environment and run your code there. – danielfrg Feb 08 '14 at 05:36
  • Is a conda virtual environment guaranteed to be sandboxed from the rest of the system? Also, does it allow resource restriction (memory, CPU)? – Stefan van der Walt Feb 08 '14 at 10:03