Why does per-process overhead constantly increase for multiprocessing?

Question

I was counting for a 6 core CPU with 12 logical CPUs in a for-loop till really high numbers several times.

To speed things up i was using multiprocessing. I was expecting something like:

Number of processes <= number of CPUs = time identical
number of processes + 1 = number of CPUs = time doubled

What i was finding was a continuous increase in time. I'm confused.

the code was:

#!/usr/bin/python

from multiprocessing import Process, Queue
import random
from timeit import default_timer as timer

def rand_val():
    num = []
    for i in range(200000000):
        num = random.random()
    print('done')

def main():

    for iii in range(15):
        processes = [Process(target=rand_val) for _ in range(iii)]
        start = timer()
        for p in processes:
            p.start()

        for p in processes:
            p.join()

        end = timer()
        print(f'elapsed time: {end - start}')
        print('for ' + str(iii))
        print('')

if __name__ == "__main__":
    main()
    print('done')

result:

elapsed time: 14.9477102 for 1
elapsed time: 15.4961154 for 2
elapsed time: 16.9633134 for 3
elapsed time: 18.723183399999996 for 4
elapsed time: 21.568377299999995 for 5
elapsed time: 24.126758499999994 for 6
elapsed time: 29.142095499999996 for 7
elapsed time: 33.175509300000016 for 8

. . .

elapsed time: 44.629786800000005 for 11
elapsed time: 46.22480710000002 for 12
elapsed time: 50.44349420000003 for 13
elapsed time: 54.61919949999998 for 14

Can you please clarify what you are asking? It is not really clear to me what kind of assumptions you make, and how they are contradicted by your results. For example, a 6 core+HT CPU can only perform 6 computations at once – assuming that your system is doing other things as well, a kink at 3-4 processes seems reasonable. — MisterMiyagi, Mar 23 '21 at 09:12
Hi, I try to understand how multiprocessing functions work. I have the idea, that for a CPU-intensive job (like a really big for-loop), it is beneficial to have multiple CPUs work parallel. So in my n-loop, there should (in my opinion) n-CPUs working on the loop. If there are more jobs than CPUs i would expect one job to wait until another CPU is done with its job. Thats the reason I expect a doubled time. — clemens, Mar 23 '21 at 09:18
Ah, no it won't work like that. The operating system scheduler will juggle them all around, so each gets told to go to sleep for a bit to let the other processes work away. The time to complete will vary, and definitely will not resemble a sequential "complete first 6 jobs, then start 7th job" — Amiga500, Mar 23 '21 at 09:22
First, I cannot reproduce your results. On my system, the execution time is constant until hitting the CPU count. Second, your assumption of "one job to wait until another CPU is done" is wrong – you are starting all processes *at once* and the OS will interleave their execution. You can see this for example if you print at the start of each process as well. — MisterMiyagi, Mar 23 '21 at 09:23
Thanks, but that is weird, that was the result I was expecting. And yes, I know, that they all start at the same time and i was expecting the n+1 wait for a free CPU (with n being the numbers of CPUs) — clemens, Mar 23 '21 at 09:26
They *do* wait for free a CPU, but they *do not* wait for another process to finish *completely*. Your OS does preemptive scheduling, meaning that it regularly pauses active processes to let other processes run. In effect, the processes are executed interleaved. Again, you could see this if you were to regularly print the progress of each process. — MisterMiyagi, Mar 23 '21 at 09:28

MisterMiyagi · Accepted Answer · 2021-03-23T09:42:19.420

There are two wrong assumptions you make:

Processes are not free. Merely adding processes adds overhead to the program.
Processes do not own CPUs. A CPU interleaves execution of several processes.

The first point is why you see some overhead even though there are less processes than CPUs. Note that your system usually has several background processes running, so the point of "less processes than CPUs" is not clearcut for a single application.

The second point is why you see the execution time increase gradually when there are more processes than CPUs. Any OS running mainline Python does preemptive multitasking of processes; roughly, this means a process does not block a CPU until it is done, but is paused regularly so that other processes can run.
In effect, this means that several processes can run on one CPU at once. Since the CPU can still only do a fixed amount of work per time, all processes take longer to complete.

Thanks! But why is from https://zetcode.com/python/multiprocessing/ the code: worker_pool.py true? Why does it double the time in this example? — clemens, Mar 23 '21 at 10:20
@clemens The code you seem to be referring to uses ``pool.map``, which *spreads* the workload over multiple cores – each core does 1/n work. Your code *duplicates* the workload - each core still does 1 work. — MisterMiyagi, Mar 23 '21 at 10:28

score 2 · Answer 2 · answered Mar 23 '21 at 08:58

2

I don't understand what you are trying to achieve?

You are taking the same work, and running it X times, where X is the number of SMPs in your loop. You should be taking the work and dividing it by X, then sending a chunk to each SMP unit.

Anyway, with regards what you are observing - you are seeing the time it takes to spawn and close the separate processes. Python isn't quick at starting new processes.

answered Mar 23 '21 at 08:58

Amiga500

1,258
1
6
11

The code is just for understanding reasons. I know, that for each loop, I double the work. I try to understand, whether there is just n-CPUs working on the n-loops. Thanks for the input about the time to create tasks. I hadn't considered this yet – clemens Mar 23 '21 at 09:20

Lie Ryan · Answer 3 · 2021-03-23T09:52:17.870

Your test is faulty.

Imagine this, it takes 1 day for one farmer to work a 10km^2 farmland using a single tractor. If there are two farmers working 20km^2 farms, why are you expecting two farmers working twice the amount of farmlands using two tractors to take less time?

You have 6 CPU cores, your village has 6 tractors, but nobody has money to buy private tractors. As the number of workers (processes) on the village increased, the number of tractors remain the same, so everyone has to share the limited number of tractors.

In an ideal world, two farmers working twice the amount of work using two tractors would take exactly the same amount of time as one farmer working one portion of work, but in real computers the machine has other work to do even if it seems idle. There are task switching, the OS kernel has to run and monitor hardware devices, memory caches needs to be flushed and invalidated between CPU cores, your browser needs to run, the village elder is running a meeting to discuss who should get the tractors and when, etc.

As the number of workers increases beyond the number of tractors, farmers don't just hog the tractors for themselves. Instead they made an arrangement that they'd pass the tractors around every three hours or so. This means that the seventh farmer don't have to wait for two days to get their share of tractor time. However, there's a cost to transferring tractors between farmlands, just as there are costs for CPU to switch between processes; task switch too frequently and the CPU is not actually doing work, and switch to infrequently and you get resource starvation as some jobs takes too long to start being worked on.

A more sensible test would be to keep the size of farmland constant and just increase the number of farmers. In your code, that would correspond to this change:

def rand_val(num_workers):
    num = []
    for i in range(200000000 / num_workers):
        num = random.random()
    print('done')

def main():
    for iii in range(15):
        processes = [Process(target=lambda: rand_val(iii)) for _ in range(iii)]
        ...

"why are you expecting two farmers working twice the amount of farmlands to take less time" – They don't. They expect it to be constant as long as they at least as many farmers as separate plots of land. Their data shows that time increases anyways before that point. — MisterMiyagi, Mar 23 '21 at 09:26
... and that it does not double if there are more farmers than land. — clemens, Mar 23 '21 at 09:31

Why does per-process overhead constantly increase for multiprocessing?

3 Answers3