Why is ThreadPoolExecutor's default max_workers decided based on the number of CPUs?

Question

The documentation for concurrent.futures.ThreadPoolExecutor says:

Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor.

I want to understand why the default max_workers value depends on the number of CPUs. Regardless of how many CPUs I have, only one Python thread can run at any point in time.

Let us assume each thread is I/O intensive and it spends only 10% of its time in the CPU and 90% of its time waiting for I/O. Let us then assume we have 2 CPUs. We can only run 10 threads to utilize 100% CPU. We can't utilize any more CPU because only one thread runs at any point in time. This holds true even if there are 4 CPUs.

So why is the default max_workers decided based on the number of CPUs?

You are talking about `GIL`, but this is an implementation detail. Other runtimes don't have this issue. — Sraw, May 18 '19 at 03:18

user2357112 · Accepted Answer · 2019-05-18T03:35:03.287

8

It's a lot easier to check the number of processors than to check how I/O bound your program is, especially at thread pool startup, when your program hasn't really started working yet. There isn't really anything better to base the default on.

Also, adding the default was a pretty low-effort, low-discussion change. (Previously, there was no default.) Trying to get fancy would have been way more work.

That said, getting fancier might pay off. Maybe some kind of dynamic system that adjusts thread count based on load, so you don't have to decide the count at the time when you have the least information. It won't happen unless someone writes it, though.

edited May 18 '19 at 03:35

answered May 18 '19 at 03:28

user2357112

260,549
28
431
505

In that case, why not an arbitrary number like `10`? – Lone Learner May 18 '19 at 03:42
3

@LoneLearner: That's not really better than what they picked. – user2357112 May 18 '19 at 03:43

score 2 · Answer 2 · answered Aug 19 '20 at 14:50

CPython thread implementation is light-weight. It mostly ships the thing to the os with some accounting for GIL (and signal handling). Increasing number of threads proportional to cores usually does not work out. Since the threads are managed by the os, with many cores, the os gets greedy and try to run as many ready threads as possible if there is a thread context switch. All of them try to acquire the GIL and only one succeeds. This leads to a lot of waste - worse than the linear calculation of assuming only one thread can run at a given time. If you are using pure CPU-bound threads in the executor, there is no reason to link it to cores because of this. But we should not deprive users who really want the CPU power and are okay with a GIL release to utilise the cores. So the arguably, the default value should be linked to the number of cores in this case - if you assume most people running Python know what they are doing.

Now if the threads in the executor are I/O-bound, then you rightly mentioned the max capacity is 1/p, where p is fraction of CPU each thread needs. For deciding the default, it is impossible to know what p is beforehand. The default minimum of 0.2 (min 5 threads) does not look too bad. But usually my guess is this p will be much lower, so the limiting factor may never be the CPU (but if it is, again we get in to the CPU thrashing problem of multiple cores as above). So the linking to number of cores will probably not end up being unsafe (unless the threads have heavy processing or you have too many cores!).

Why is ThreadPoolExecutor's default max_workers decided based on the number of CPUs?

2 Answers2