2

I have a use case where I have to process some documents and which takes some time. So I tried batching the documents and multiprocessing them, it worked good and completed in less time as expected. Also there are multiple stages of processing docs, I used multiprocessing at all the stages individually. When I fire multiple concurrent requests to do the processing, after serving some 70+ requests, I noticed that some of the processes are not killed.

I am performing load test with locust, where I create 5 users and with a wait time of 4 - 5 seconds, and each request approximately takes 3.5 secs, so I tried multiprocessing package and various others wrappers (pebble, parallel-execute, pathos, concurrent.futures).

What basically I do is,

from multiprocessing import Pool

with Pool(processes=5) as p:
    out = p.starmap(do_something, args)
    p.close()
    p.terminate()

Also the official documentation says that the pool will be closed after the execution while doing like this with. When I stop the request firing, the last one or two requests are stagnant. I found this by just by printing "Started {req_num}" and "Served {req_num}" before and after the process. Before adding p.close() and p.terminate() I could see more process being running after stopped triggering requests. After adding them only the last triggered process is not served. And now if I start triggering the requests and stop them after a while again the same last one or two requests are not served and their processes are stagnant. So the stagnant process accumulates.

And every wrapper, I mentioned had different way of closing the pool. I tried them too. like with pathos,

p = Pool(processes=5)
out = p.map(do_something, args)
p.join()
p.close()
p.terminate()

And with concurrent.future.ThreadPoolExecutor it was p.shutdown(). In every other wrapper I was facing the same issue. Here the number of stagnant processes were more than it was in multiprocessing.Pool

I need help in finding the reason or the right way to do this. Any help would be much appreciated!

Arjun Sankarlal
  • 2,655
  • 1
  • 9
  • 18

1 Answers1

2

To shutdown pool properly just call:

Pool.close()  # terminate worker processes when all work already assigned has completed
Pool.join()  # wait all processes to terminate
alex_noname
  • 26,459
  • 5
  • 69
  • 86