4

I've heard something about "If you want to get maximum performance from parallel application, you should create as many processes as your computer has CPUs, and in each process -- create some (how many?) threads".

Is it true?

I wrote a piece of code implementing this idiom:

import multiprocessing, threading

number_of_processes = multiprocessing.cpu_count()
number_of_threads_in_process = 25   # some constant


def one_thread():
    # very heavyweight function with lots of CPU/IO/network usage
    do_main_work()


def one_process():
    for _ in range(number_of_threads_in_process):
        t = threading.Thread(target=one_thread, args=())
        t.start()


for _ in range(number_of_processes):
    p = multiprocessing.Process(target=one_process, args=())
    p.start()

Is it correct? Will my do_main_work function really run in parallel, not facing any GIL-issues?

Thank you.

vortexxx192
  • 929
  • 1
  • 9
  • 24

3 Answers3

3

It really depends very much on what you're doing.

Keep in mind that in CPython, only one thread at a time can be executing Python bytecode (because of the GIL). So for a computation-intensive problem in CPython threads won't help you that much.

One way to spread out work that can be done in parallel is to use a multiprocessing.Pool. By default this does not use more processes that your CPU has cores. Using many more processes will mainly have them fighting over resources (CPU, memory) than getting useful work done.

But taking advantage of multiple processors requires that you have work for them to do! In other words, if the problem cannot be divided into smaller pieces that can be calculated separately and in parallel, many CPU cores will not be of much use.

Additionally, not al problems are bound by the amount of calculation that has to be done.

The RAM of a computer is much slower than the CPU. If the data-set that you're working on is much bigger than the CPU's caches, reading data from and returning the results to RAM might become the speed limit. This is called memory bound.

And if you are working on much more data than can fit in the machine's memory, your program will be doing a lot of reading and writing from disk. A disk is slow compared to RAM and very slow compared to a CPU, so your program becomes I/O-bound.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
2
# very heavyweight function with lots of CPU/IO/network usage

Lots of CPU will suffer because of GIL, so you'll only get benefit from multiple processes.

IO and network (in fact network is also kind of IO) won't be affected too much by GIL because lock is released explicitly and obtained again after IO operation is completed. There are macro-definitions in CPython for this:

Py_BEGIN_ALLOW_THREADS
... Do some blocking I/O operation ...
Py_END_ALLOW_THREADS

There still be a performance hit because of GIL being utilized in wrapping code, but you still get better performance with multiple threads.

Finally - and this is a general rule - not only for Python: Optimal number of threads/processes depends on what the program is actually doing. Generally if it utilizes CPU intensively, there is almost no performance boost if number of processes is greater than number of CPU cores. For example Gentoo documentation says that optimal number of threads for compiler is CPU cores + 1.

ElmoVanKielmo
  • 10,907
  • 2
  • 32
  • 46
0

I think the number of threads you are using per process is too high.Usually for any Intel Processor the number of threads per process is 2.The number of cores vary from 2(Intel core i3) to 6(Intel core i7).So at a time when all the processes are running the maximum number of threads will be 6*2=12.