8

I have bunch of python-projects with untrusted WSGI-apps inside them. I need to run them simulatiously and safely. So I need restrictions for directory access, python module usage and limitations for CPU and Memory.

I consider two approaches:

  1. Import via imp-module WSGI-object from defined file, and running it with pysandbox. Now I have SandboxError: Read only object when doing:

    self.config  = SandboxConfig('stdout')
    self.sandbox = Sandbox(self.config)
    self.s = imp.get_suffixes()
    wsgi_obj = imp.load_module("run", open(path+"/run.py", "r"), path, self.s[2]).app
    …
    return self.sandbox.call(wsgi_obj, environ, start_response)
    
  2. Modify Python interpreter, exclude potentially risky modules, run in parallel processes, communicate via ZMQ/Unix sockets. I even don't know where to start here.

What could you recommend?

sashab
  • 1,534
  • 2
  • 19
  • 36
  • Process separation is definitely a good idea. Even better would be to use virtualization. I don't know about pysandbox, but I heard that the existing solutions for sandboxing Python code aren't especially good. – Niklas B. May 14 '12 at 17:13
  • 1
    Sandboxing in CPython isn't very good, but other Python interpreters, particularly PyPy, have more complete sandboxing support. – Andrew Gorcester May 14 '12 at 18:11
  • I tried PyPy sandboxing. It is too complicated. – sashab May 14 '12 at 21:36

1 Answers1

3

I would run your applications with gunicorn, with a separate process and configuration for each app, and with user-level permissions (each untrusted app on a different user). Each gunicorn instance would serve on localhost on a user-range port, and nginx or another webserver could connect into them to route and serve them to the web.

Heroku takes this a step further and sandboxes each gunicorn instance (or unicorn or apache or arbitrary other server) in a virtual machine. This is probably the most secure possible way to do things, and definitely the best option for reliably limiting CPU and memory usage, but you may not need to go that far depending on your requirements.

One of the advantages of this kind of approach is that each application can run on a different version of Python if appropriate; with the virtual machine sandbox they can even run on different operating systems entirely.

Edit: To limit memory usage without using a VM sandbox approach, see this question. To limit CPU usage, tweak the gunicorn settings -- spin up one gevent-style worker per core an application is allowed to use.

Edit again: One completely different approach would be to use PyPy's sandboxing mechanism which should be much more secure than CPython plus a sandboxing module. However, I would prefer the guincorn or gunicorn + virtual machine approach.

Community
  • 1
  • 1
Andrew Gorcester
  • 19,595
  • 7
  • 57
  • 73
  • What could you recommend as solution for routing? It should be fast and async, and because of some additional logic nginx is not solution. Tornado? – sashab May 14 '12 at 21:35
  • Hmmmm, I'm not sure, I use nginx as a front for gunicorn myself and have not experimented much with that part of the stack. Does the additional logic require you to write custom code? In that case maybe Tornado would be best. Gunicorn has some sort of Tornado support but I believe it's for a different purpose than what you have in mind. – Andrew Gorcester May 15 '12 at 00:16
  • Well, I basically need to route subdomain to app, and do it dynamically(number of apps. could change). – sashab May 15 '12 at 05:37
  • You could probably programmatically add and remove entries to the nginx config file when an app is added, and reload the config without disrupting service. It wouldn't be as flexible as a custom-coded solution, but the operational advantages of using such a tried-and-true routing method might outweigh that. – Andrew Gorcester May 15 '12 at 06:11