I have recently been working on a pet project in python using flask. It is a simple pastebin with server-side syntax highlighting support with pygments. Because this is a costly task, I delegated the syntax highlighting to a celery task queue and in the request handler I'm waiting for it to finish. Needless to say this does no more than alleviate CPU usage to another worker, because waiting for a result still locks the connection to the webserver. Despite my instincts telling me to avoid premature optimization like the plague, I still couldn't help myself from looking into async.
Async
If have been following python web development lately, you surely have seen that async is everywhere. What async does is bringing back cooperative-multitasking, meaning each "thread" decides when and where to yield to another. This non-preemptive process is more efficient than OS-threads, but still has it's drawbacks. At the moment there seem to be 2 major approaches:
- event/callback style multitasking
- coroutines
The first one provides concurrency through loosely-coupled components executed in an event loop. Although this is safer with respect to race conditions and provides for more consistency, it is considerably less intuitive and harder to code than preemptive multitasking.
The other one is a more traditional solution, closer to threaded programming style, the programmer only having to manually switch context. Although more prone to race-conditions and deadlocks, it provides an easy drop-in solution.
Most async work at the moment is done on what is known as IO-bound tasks, tasks that block to wait for input or output. This is usually accomplished through the use of polling and timeout based functions that can be called and if they return negatively, context can be switched.
Despite the name, this could be applied to CPU-bound tasks too, which can be delegated to another worker(thread, process, etc) and then non-blockingly waited for to yield. Ideally, these tasks would be written in an async-friendly manner, but realistically this would imply separating code into small enough chunks not to block, preferably without scattering context switches after every line of code. This is especially inconvenient for existing synchronous libraries.
Due to the convenience, I settled on using gevent for async work and was wondering how is to be dealt with CPU-bound tasks in an async environment(using futures, celery, etc?).
How to use async execution models(gevent in this case) with traditional web frameworks such as flask? What are some commonly agreed-upon solutions to these problems in python(futures, task queues)?
EDIT: To be more specific - How to use gevent with flask and how to deal with CPU-bound tasks in this context?
EDIT2: Considering how Python has the GIL which prevents optimal execution of threaded code, this leaves only the multiprocessing option, in my case at least. This means either using concurrent.futures or some other external service dealing with processing(can open the doors for even something language agnostic). What would, in this case, be some popular or often-used solutions with gevent(i.e. celery)? - best practices