pathos pools: Renew worker processes after N tasks

Question

I am building a parallel python application, that essentially calls a C wrapper around an external library. Parallelism is needed to be able to run simultaneously computations on all CPU cores.

I ended up using pathos.multiprocessing.ProcessPool, but these pools lacks the maxtaskperchild argument of the standard multiprocessing.Pool class constructor (see reference here ). I need this feature because the C library relies on the process clock to define some execution time limits, that are eventually reached when the tasks pile up.

Is there a way to ask the ProcessPool manager to renew worker processes after a given number of tasks ?

Example code to clarify my intent:

from pathos.pools import ProcessPool
from os import getpid
import collections

def print_pid(task_id):
    pid = getpid()
    return pid

if __name__ == "__main__":
    NUM_TASKS = 50
    MAX_PER_CHILD = 2


    # limit each process to maximum MAX_PER_CHILD tasks
    # we would like the pool to exit the process and spawn a new one
    # when a task counter reaches the limit
    # below argument 'maxtasksperchild' would work with standard 'multiprocessing'
    pool = ProcessPool(ncpu=2, maxtasksperchild=MAX_PER_CHILD)
    results = pool.map(print_pid, range(NUM_TASKS), chunksize=1)

    tasks_per_pid = dict(collections.Counter(results))
    print(tasks_per_pid)

# printed result
# {918: 8, 919: 6, 920: 6, 921: 6, 922: 6, 923: 6, 924: 6, 925: 6}
# observe that all processes did more than MAX_PER_CHILD tasks

what I tried

setting the maxtasksperchild in the ProcessPool constructor (cf naive example above) does not seem to do anything
calling sys.exit() in worker function makes the program hang
I have found hints when diving into the source code

score 2 · Accepted Answer · answered Aug 23 '19 at 13:39

2

In pathos.multiprocessing there are two pools: ProcessPool and _ProcessPool. The former is designed to have an augmented pool lifecycle that minimizes start-up time, and has persistence and restart capabilities -- however, lacks some of the "multiprocessing" keywords. The latter (_ProcessPool) is one level of API design down, and provides an interface that is identical to the multiprocessing Pool interface (but uses dill). So, take a look at the _ProcessPool.

answered Aug 23 '19 at 13:39

Mike McKerns

33,715
8
119
139

thanks, and kudos for maintaining this library! Does `_ProcessPool` have a chance to disappear in the future, since it is not documented ? – Grégoire Roussel Aug 24 '19 at 15:48
There is no chance for it to disappear in the future. It actually shows up in two other places as the exact same object... if you look in `pathos.pools`, it's `_ProcessPool` there as well, and both modules import `pathos.helpers.ProcessPool as _ProcessPool`. The `pathos` `ProcessPool` is built on top of it, it's not going away. – Mike McKerns Aug 24 '19 at 16:04

pathos pools: Renew worker processes after N tasks

what I tried

1 Answers1