18

I have recently been working on a pet project in python using flask. It is a simple pastebin with server-side syntax highlighting support with pygments. Because this is a costly task, I delegated the syntax highlighting to a celery task queue and in the request handler I'm waiting for it to finish. Needless to say this does no more than alleviate CPU usage to another worker, because waiting for a result still locks the connection to the webserver. Despite my instincts telling me to avoid premature optimization like the plague, I still couldn't help myself from looking into async.

Async

If have been following python web development lately, you surely have seen that async is everywhere. What async does is bringing back cooperative-multitasking, meaning each "thread" decides when and where to yield to another. This non-preemptive process is more efficient than OS-threads, but still has it's drawbacks. At the moment there seem to be 2 major approaches:

  • event/callback style multitasking
  • coroutines

The first one provides concurrency through loosely-coupled components executed in an event loop. Although this is safer with respect to race conditions and provides for more consistency, it is considerably less intuitive and harder to code than preemptive multitasking.

The other one is a more traditional solution, closer to threaded programming style, the programmer only having to manually switch context. Although more prone to race-conditions and deadlocks, it provides an easy drop-in solution.

Most async work at the moment is done on what is known as IO-bound tasks, tasks that block to wait for input or output. This is usually accomplished through the use of polling and timeout based functions that can be called and if they return negatively, context can be switched.

Despite the name, this could be applied to CPU-bound tasks too, which can be delegated to another worker(thread, process, etc) and then non-blockingly waited for to yield. Ideally, these tasks would be written in an async-friendly manner, but realistically this would imply separating code into small enough chunks not to block, preferably without scattering context switches after every line of code. This is especially inconvenient for existing synchronous libraries.


Due to the convenience, I settled on using gevent for async work and was wondering how is to be dealt with CPU-bound tasks in an async environment(using futures, celery, etc?).

How to use async execution models(gevent in this case) with traditional web frameworks such as flask? What are some commonly agreed-upon solutions to these problems in python(futures, task queues)?

EDIT: To be more specific - How to use gevent with flask and how to deal with CPU-bound tasks in this context?

EDIT2: Considering how Python has the GIL which prevents optimal execution of threaded code, this leaves only the multiprocessing option, in my case at least. This means either using concurrent.futures or some other external service dealing with processing(can open the doors for even something language agnostic). What would, in this case, be some popular or often-used solutions with gevent(i.e. celery)? - best practices

nikitautiu
  • 951
  • 1
  • 14
  • 28
  • You can adopt this pattern to almost every library: http://bottlepy.org/docs/dev/async.html#event-callbacks. I would suggest `evergreen`, because it allows combining cooperative tasks (greenlets) with long running tasks by integrating a modified version of `concurrent.futures`. – schlamar May 06 '13 at 10:31

2 Answers2

9

It should be thread-safe to do something like the following to separate cpu intensive tasks into asynchronous threads:

from threading import Thread

def send_async_email(msg):
    mail.send(msg)

def send_email(subject, sender, recipients, text_body, html_body):
    msg = Message(subject, sender = sender, recipients = recipients)
    msg.body = text_body
    msg.html = html_body
    thr = Thread(target = send_async_email, args = [msg])
    thr.start()

IF you need something more complicated, then perhaps Flask-Celery or Multiprocessing library with "Pool" might be useful to you.

I'm not too familiar with gevent though I can't imagine what more complexity you might need or why.

I mean if you're attempting to have efficiency of a major world-website, then I'd recommend building C++ applications to do your CPU-intensive work, and then use Flask-celery or Pool to run that process. (this is what YouTube does when mixing C++ & Python)

Dexter
  • 6,170
  • 18
  • 74
  • 101
  • Any source for how Youtube does it(blogpost or something)? – nikitautiu Apr 13 '13 at 07:36
  • I ended up using celery, but as I said, waiting for a result ends up blocking the webserver. The solution was to server the WSGI app with either the gevent or gunicorn server with the gevent worker. For the async result I simply poll `ready()` and if not completed I yield with a canonical `gevent.sleep()`. – nikitautiu Apr 13 '13 at 12:43
  • I'm afraid I don't have a link, but the google blog might be a place to look for it. – Dexter Apr 15 '13 at 06:50
2

How about simply using ThreadPool and Queue? You can then process your stuff in a seperate thread in a synchronous manner and you won't have to worry about blocking at all. Well, Python is not suited for CPU bound tasks in the first place, so you should also think of spawning subprocesses.

freakish
  • 54,167
  • 9
  • 132
  • 169
  • That is a valid solution, and there are a couple other I could think about. The point is, I was wondering how to integrate gevent into flask, and how do people who use gevent deal with situations where it's borderline impossible to make a synchronous task "green". – nikitautiu Apr 12 '13 at 12:54