Preforking a Multithreaded Python application

Question

I have a Python program that is already multithreaded and I'd like to replace some of the threads with processes in order to reduce context switching and utilize gevent for async I/O.
The main process is I/O bound so I'd like to use gevent in order to be able to handle a lot of concurrent I/O. We'll call it the Receiver component of my system.

The rest of the program is mostly CPU bound so I'd like to have each process have some threads that handle requests from the Receiver. These are my worker processes.
The reason that I chose threads for handling multiple requests in one process is because threads are cheaper to create and destroy. If the program receives a lot of requests it can automatically scale to start more threads in order to handle more requests. When the load decreases it can get rid of the extra threads in order to avoid the extra overhead of context switching.

Forking with gevent can cause some problems and gipc exists exactly to solve those problems.
The worker threads do sometimes read from various sources such as cache and databases but if I understand correctly the GIL will switch to another thread when I/O occurs.

If I do decide I want gevent inside my workers I can (I think) avoid monkeypatching the threading module and assign a greenlet pool for each worker process. Will the GIL still be released when I/O occurs and another thread will start executing until the I/O call completes when combining gevent with threads?

Finally there's another process which saves the response to a database. It's naturally I/O bound so gevent would be an excellent choice to perform this action.

I have read about the dangers of mixing threads and prefork. I'm not going to create any threads in the main process so no locking mechanisms such as mutexes will be copied to the child processes. I am not going to fork any of my child processes. Is it safe to assume I'm not in trouble in any stage of this design? Does Python mitigate some of the problems with preforking and threading?

If your workers truly are CPU-bound, threads are exactly *not* the way to scale. Because of the GIL. So your overall architecture looks perverted (or better inverted ;)) to me - you *could* use threads for your Receiver, because they'll block. Being async via gevent resource footprint, but shouldn't affect performance that much. But then your workers should be standard python lib's multiprocessing-based. And you could pre-allocate a few dozen of them, because again if you were 100% CPU-bound, it makes no sense having more processes than Cores. Add *some* blocking IO to it, so add some process. — deets, Dec 14 '14 at 13:46
I am aware of the GIL but not afraid of it as much as other programmers do. For what reason would you not use gevent for I/O only tasks? The autoscaling feature is a feature that I really need since our workload varies. Creating and destroying processes is much more expensive. Should I not be concerned about it? Also having as many processes as many cores that I have means that I can only process a few requests at the time for each server which is pretty expensive. — the_drow, Dec 14 '14 at 13:53
Also why should one choose blocking I/O over nonblocking I/O? — the_drow, Dec 14 '14 at 13:54
well, you contradict yourself here: if you are CPU-bound in your workers, you *must* be afraid of the GIL. Because it effectively serializes execution. Which is the exact reason for the introduction of the multiprocessing-module. Regarding blocking/threaded IO vs. non-blocking: I'm not saying you shouldn't use gevent. But AFAIK async processes are no magical cure for anything, all they do is reduce the footprint of your process resource-wise (less memory) because the OS manages the IO dispatch instead of you having threads sitting on it. Relevant in high-load scenarios. No idea if you have one — deets, Dec 14 '14 at 13:59
To elaborate some more: if your task is truly 100% CPU-bound, how do you imagine you can serve *more* requests than your number of cores? It's physically impossible. As few things are a 100% CPU bound, you can have a few more processes doing the work, but creating and re-creating them every now and then is not a burden your OS and overall app will suffer from. But if you want to serve more requests than you have cores, you need a cluster of machines. Again with processes doing the heavy lifting. In C/C++/Java, you could instead use threads. But there, too, apply the limits of core #. — deets, Dec 14 '14 at 14:13
My tasks are **mostly** CPU bound. If I trade context switches for parallelism I can process more events concurrently. Yes it will be a bit slower but won't it be quicker (if I define a sensible threshold of maximum threads in the worker's pool) then blocking the response until a worker is available? — the_drow, Dec 14 '14 at 14:24
Well, *nothing* is 100% CPU bound. Otherwise, the task couldn't get parameters, nor influence the world through output. But it's a fact that a Python process with multiple threads can *only* run on one core. So - you have to launch a pool of processes to utilise several cores. And as your tasks need IO, yes, you can use threads. But what is your reasoning of doing that in your *workers* when your receiver is supposedly better of using async IO for the exact same thing - dispatching IO? — deets, Dec 14 '14 at 14:30
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/66848/discussion-between-deets-and-the-drow). — deets, Dec 14 '14 at 14:37

score 1 · Accepted Answer · answered Dec 14 '14 at 15:48

1

Python's GIL will prevent any actual concurrency within a single Python process. So while you can use multithreading instead or async IO to deal with a multitude of requests per worker, for true concurrency you need the multiprocessing package of python. You should probably use a Pool with a configured max_requests_per_child of a few hundred or so requests, and must pay attention to the number of actual processes. If your task is truly hard on the CPU, you can stall your system if you have no cores left doing "other stuff". But this can only be inferred through experimentation.

answered Dec 14 '14 at 15:48

deets

6,285
29
28

If I enhance https://docs.python.org/2/library/sys.html#sys.setcheckinterval in some way would it mean I'll be able to live with the multithreaded program for the time being? It simply isn't worth my time to rewrite it right now. – the_drow Dec 14 '14 at 16:23
1

You are perfectly fine living with your program - it's yours :) It will simply not be concurrent. If that's acceptable is up to your discretion. For the usability or rathe lack thereof of your suggestion, read http://pymotw.com/2/sys/threads.html – deets Dec 15 '14 at 09:28

Preforking a Multithreaded Python application

1 Answers1