1

I am observing increase in execution time of python script when I trigger parallel instances of it using process pool executor on a 56 core machine. The script abc.py imports a heavy python library which takes around 1 seconds.

time python ~/abc.py

real 0m0.846s
user 0m0.620s
sys 0m0.078s

Test Method

import shlex
from subprocess import Popen, PIPE

def test():
    command = "python /u/deeparora/abc.py"
    p = Popen(shlex.split(command), stdout=PIPE, stderr=PIPE)
    p.wait(timeout=None)

Below code also takes 1 second which is expected

Serial Execution

import concurrent.futures

pool = ProcessPoolExecutor(max_workers=1)
futures = []
 
for index in range(0, 1):
    futures.append(pool.submit(test))

for future in concurrent.futures.as_completed(futures):
    pass

However the below code takes 5 seconds to execute on 56 core machine

Parallel Execution

import concurrent.futures

pool = ProcessPoolExecutor(max_workers=50)
futures = []
 
for index in range(0, 50):
    futures.append(pool.submit(test))

for future in concurrent.futures.as_completed(futures):
    pass

I checked the execution time in process logs and could notice that now the script (abc.py) execution time has also increased from 1 to 4 seconds. Can somebody help me understand this behavior?

Deepanshu Arora
  • 375
  • 1
  • 5
  • 21
  • What does `abc.py` do? If it does I/O operations such as accessing the disk, you may be bottlenecked on your disk throughput rather than on CPU. – rchome Jan 05 '22 at 05:45
  • @rchome abc.py calls psutil to find Process details which is running on the same host where abc.py is running – Deepanshu Arora Jan 05 '22 at 11:41
  • You noted that abc.py imports a heavy python library which may create an I/O bottleneck. Can you provide more details or the full code for that? – Jan Wilamowski Jan 06 '22 at 01:54
  • @JanWilamowski I cannot share the full code but under the hood psutil is getting imported to find whether a process exists with a given pid or not. psutil.Process(pid=1234) – Deepanshu Arora Jan 07 '22 at 07:33
  • I suggest you do some more detailed profiling. Just printing out the time taken by operations like the importing of the heavy library in your threads, then compare that to the single-threaded version. If it's much higher (accumulated) then you know that's the issue. – Jan Wilamowski Jan 07 '22 at 08:41
  • hyperthreading or actual cores? – Bharel Jan 08 '22 at 12:41
  • @Bharel actual cores are 56 – Deepanshu Arora Jan 09 '22 at 11:58
  • I have two questions: Why use `open` instead of `Popen`? I tried your code but is giving me a TypeError indicating that open miss the `file` argument. And why use the argument `command_or_args`? I can see it neither on the standard `open` (https://docs.python.org/3/library/functions.html#open) method nor `Popen` (https://docs.python.org/3/library/subprocess.html#subprocess.Popen) – Miguel Jan 10 '22 at 09:17
  • @Miguel I have fixed my code . These were copy paste errors – Deepanshu Arora Jan 10 '22 at 10:50

1 Answers1

0

Check Graph Here I tried to run this. and found interesting results.

  1. When the function given is too simple. Then Function execution time < Multi Pool creation Time. So adding more workers will increases the total time.

  2. To validate this, Check the experiment with sleep(0.001) below.

  3. From Graph, First total time reduces when I increases workers but then after a point, total time begins to increase because cost of creating and closing workers is higher than the calculation time itself.

    from concurrent.futures import ProcessPoolExecutor
    from time import sleep, time
     
    values = [3,4,5,6] * 200
    def cube(x):
        sleep(0.001)
        return x*x*x
    
    times = []
    total_threds = [i for i in range(1, 20)]
    for num_tread in range(1, 20):
        print(f'Processing thread: {num_tread}')
        st_time = time()
        
        with ProcessPoolExecutor(max_workers=num_tread) as exe:
            exe.submit(cube,2)
    
            # Maps the method 'cube' with a iterable
            result = exe.map(cube,values)
            
        
        end_time = time()[enter image description here][1]
        times.append(end_time - st_time)
    
    plt.plot(total_threds, times)
    plt.title('Number of threads vs Time taken to Run')
    plt.xlabel('Number of Threads')
    plt.ylabel('Time taken in ms')

Check Graph Here

Manish Sahu
  • 86
  • 1
  • 4