1

I was trying to apply multiplication on each element of a list in parallel, using multiprocessing.Process. But it seems, when i'm calling multiprocessing.Process.start() main process starts to wait newly started process to be finished synchronously, without any parallelism, which i'm expecting.

The code:

import math
import time

N = 10000000
MAX_NUM = 100
THREADS_AMOUNT = 6
THREAD_LIST_LEN = math.ceil(N / THREADS_AMOUNT)

def multiply_vector(vector, num_multiply=2):
    print('start multiplication')
    result = [num * num_multiply for num in vector]
    print('end multiplication')
    return result

def main():
    import numpy as np
    from multiprocessing import Process

    random_list = np.random.rand(N)
    random_list = list(random_list)

    chunks = [random_list[i * THREAD_LIST_LEN: (i + 1) * THREAD_LIST_LEN] for i in range(THREADS_AMOUNT)]

    start = time.perf_counter()

    procs = []

    for chunk in chunks:
        proc = Process(target=multiply_vector, args=(chunk,))

        procs.append(proc)

    for proc in procs:
        print('start proccess')
        proc.start()
        print('after start process')
        print('----------------------------------------------------------------')


    for proc in procs:
        proc.join()

    end = time.perf_counter()
    print(f"{end - start:.5f}")

if __name__ == '__main__':
    main()

Logs

Can you help me find out what the problem is?

Questioner
  • 11
  • 1
  • I'd rename `THREAD` to `PROC`, but don't know the answer as [the docs suggest similar](https://docs.python.org/3/library/multiprocessing.html#:~:text=range(NUMBER_OF_PROCESSES)). If my MacBook requires using [ProcessPool](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor-example) the docs should make that clear. – Cees Timmerman Dec 21 '22 at 15:08
  • Thanks for the answer, Cees! Yep, good idea to rename. Tried with ProcessPool, it gives same results =( – Questioner Dec 21 '22 at 18:06
  • I guess the reason is somewhere in `args` param, when I write code without it, everything works in parallel – Questioner Dec 21 '22 at 18:47
  • Both appear to run async on all cores as intended, but using `[random.random() for i in range(N)]` instead of `np.random.rand(N)` the program completes in 0.55987 instead of 12.68054 seconds on this Apple M1 Pro, apparently using more CPU on the 8 cores. – Cees Timmerman Dec 21 '22 at 20:29
  • On my machine it doesn't run async, you can check logs for proofs. – Questioner Dec 27 '22 at 15:09
  • You start more than one process before multiplying; async. – Cees Timmerman Dec 28 '22 at 20:35

1 Answers1

0

The problem is passing <class 'numpy.float64'> instead of <class 'float'>, which is the same according to this answer:

print(random_list[0].hex(), type(random_list[0]))
print(float(random_list[0]).hex(), type(float(random_list[0])))
0x1.7c61d92be8a60p-4 <class 'numpy.float64'>
0x1.7c61d92be8a60p-4 <class 'float'>

Use random_list = [float(x) for x in np.random.rand(N)] as a workaround.

<class 'numpy.float64'> 0x1.4d5e3d424fca0p-3
<class 'float'> 0x1.4d5e3d424fca0p-3
1666667 [0.16277740343682456, 0.2236445811127703, 0.24302636516506426]
1666667 [0.9839224808778051, 0.7885649288285472, 0.5115890361207274]
1666667 [0.6110548657514351, 0.4254468671297228, 0.755695655477851]
1666667 [0.7875976996371834, 0.1860834865513915, 0.6425133427321916]
1666667 [0.7248206935294595, 0.912748881029397, 0.2777262300084712]
1666665 [0.5982501130550539, 0.8190458209204438, 0.36885142475263655]
start proccess
start multiplication
after start process
----------------------------------------------------------------
start proccess
end multiplication
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
44.44382

(venv)> python multiprocdemo.py
<class 'float'> 0x1.d819f64362c80p-8
<class 'float'> 0x1.d819f64362c80p-8
1666667 [0.00720369589124481, 0.2714234018047603, 0.2374615793977285]
1666667 [0.013831371021074346, 0.12480464518099776, 0.23994124785173276]
1666667 [0.6838916452098018, 0.4791172707405815, 0.8731298576461729]
1666667 [0.38914850916876287, 0.9744634265322073, 0.8872740902618148]
1666667 [0.2565996268152796, 0.8731909755923012, 0.29488407178637677]
1666665 [0.41678072679755296, 0.5166087260179636, 0.15102593638101824]
start proccess
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start proccess
start multiplication
end multiplication
after start process
----------------------------------------------------------------
start multiplication
end multiplication
2.83933

Alternatively, skip multiprocessing and use numpy for float64 on all cores.

Cees Timmerman
  • 17,623
  • 11
  • 91
  • 124