I am trying to improve the speed of some code with multiprocess. And I noticed the speed does not increase as expected. I know there are overheads for the spawn of child processes and there are overheads for data transfer between the parent process and child processes. However, even after I minimized the overheads, the performance with multiprocess is still not what I expected. So I write a simple test code:
import multiprocessing
import numpy as np
import time
def test_function():
start_time = time.time()
n = 1000
x = np.random.rand(n,n)
p = np.random.rand(n,n)
y = 0
for i in range(n):
for j in range(n):
y += np.power(x[i][j], p[i][j])
print ("= Running time:",time.time()-start_time)
return
def main():
procs = [1,2,3,4,5,6]
for proc in procs:
print("Number of process:", proc)
pool = multiprocessing.Pool(processes=proc)
para = [(),] * proc
pool.starmap(test_function,para)
pool.close()
pool.join()
if __name__ == '__main__':
main()
You can see that the test function only has two loops and some mathematics computations. There are no data transfer between the main process and the children process, and the time is calculated inside the child process, so no overhead will be included. And here is the output:
Number of process: 1
= Running time: 4.253360033035278
Number of process: 2
= Running time: 4.404280185699463
= Running time: 4.411274671554565
Number of process: 3
= Running time: 4.580170154571533
= Running time: 4.59316349029541
= Running time: 4.610152959823608
Number of process: 4
= Running time: 4.908967733383179
= Running time: 4.926954030990601
= Running time: 4.997913122177124
= Running time: 5.09885048866272
Number of process: 5
= Running time: 5.406658172607422
= Running time: 5.441636562347412
= Running time: 5.4576287269592285
= Running time: 5.473618030548096
= Running time: 5.621527671813965
Number of process: 6
= Running time: 6.195171594619751
= Running time: 6.225149869918823
= Running time: 6.256133079528809
= Running time: 6.290108919143677
= Running time: 6.339082717895508
= Running time: 6.3710620403289795
The code is executed under Windows 10 with i7 CPU with 4 cores,8 logic processes. Obviously the running time for each process is increasing as the number of process increases. Is this caused by the operating system or the limitation of the CPU itself or other hardware as well?
Update: here is the out put in Linux environment. It is interesting to see that with 5 processes, 2 processes time have big jump and with 6 processes, 4 processes time have big jump. It seems that it is related with the logic processors? The physical cores need to switch/swap sources for the logic processors?
Number of process: 1
= Running time: 4.039047479629517
Number of process: 2
= Running time: 4.150756597518921
= Running time: 4.159530878067017
Number of process: 3
= Running time: 4.228744745254517
= Running time: 4.261997938156128
= Running time: 4.324823379516602
Number of process: 4
= Running time: 4.342475891113281
= Running time: 4.347326755523682
= Running time: 4.350982427597046
= Running time: 4.370999574661255
Number of process: 5
= Running time: 4.369337797164917
= Running time: 4.391499757766724
= Running time: 4.43767237663269
= Running time: 6.300408124923706
= Running time: 6.31215763092041
Number of process: 6
= Running time: 4.366948366165161
= Running time: 4.38712739944458
= Running time: 6.366809844970703
= Running time: 6.370593786239624
= Running time: 6.422687530517578
= Running time: 6.433435916900635