2

I have a function which I would like to be executed several times in parallel, but with only a defined number of instances at the same time.

The natural way to do this seems to be to use multiprocessing.Pool. Specifically, the documentation says that

A frequent pattern (...) is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.

maxtasksperchild is defined as:

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

I am not clear what task means here. If I want to have, say, only up to 4 instances of my worker running in parallel should I initiate multiprocessing.Pool as

pool = multiprocessing.Pool(processes=4, maxtasksperchild=4)

How processes and maxtasksperchild work together? Could I set processes to 10 and still have only 4 workers running (effectively having 6 processes idle?)

WoJ
  • 27,165
  • 48
  • 180
  • 345
  • When you do a `p.map(f, s)` where `p` is a `Pool`, each element of the sequence `s` counts as a task. `p.apply(f)` counts as one task. – Dan D. Sep 10 '15 at 09:19
  • You got the meaning of `maxtasksperchild` all wrong. However, *why* would you want to keep around 6 idle processes? How is this better than creating only 4 processes, which do actual work? – shx2 Sep 10 '15 at 09:43
  • @DanD. thanks - so would the tasks be used by any of the processes (4 in my case) once another task is done? (they are queued to be processed?) – WoJ Sep 10 '15 at 10:34
  • @shx2: *You got the meaning of maxtasksperchild all wrong* -- probably, but this is not very helpful. As for the second part - I was using an **example** to try to understand the relationship between the two parameters, this is obviously not real prod code. – WoJ Sep 10 '15 at 10:34
  • Yes, each process would get another task when it has finished its current one if one is available. Inside the Pool is a task queue. – Dan D. Sep 10 '15 at 15:23

1 Answers1

3

As doc said (also in your describe),

processes is number of parallel worker could be run together, if not set, it will be the same as CPU number in your computer.

maxtasksperchild is max number of task that each process could deal with, that means if number of task finished achieves maxtasksperchild, that process will be killed and a new process will be started and added to Pool

Let me check the code:

def f(x):
    print "pid: ", os.getpid(), " deal with ", x
    sys.stdout.flush()

if __name__ == '__main__':
    pool = Pool(processes=4, maxtasksperchild=2)
    keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    result = pool.map(f, keys)

here we use 4 process, each of them will be killed after 2 tasks executed. After code executed, you could see:

pid:  10899  deal with  1
pid:  10900  deal with  2
pid:  10901  deal with  3
pid:  10899  deal with  5
pid:  10900  deal with  6
pid:  10901  deal with  7
pid:  10902  deal with  4
pid:  10902  deal with  8
pid:  10907  deal with  9
pid:  10907  deal with  10

processes [10899-10902] are killed after each of them executes 2 tasks, and a new process 10907 will be used to execute the last one.

As compare, if we use a larger maxtasksperchild or default value (which means process will never be killed and be alive as long as Pool), as the following code:

if __name__ == '__main__':
    pool = Pool(processes=4, maxtasksperchild=10)
    keys = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    result = pool.map(f, keys)

The result:

pid:  13352  deal with  1
pid:  13353  deal with  2
pid:  13352  deal with  4
pid:  13354  deal with  3
pid:  13353  deal with  6
pid:  13352  deal with  7
pid:  13355  deal with  5
pid:  13354  deal with  8
pid:  13353  deal with  9
pid:  13355  deal with  10

As you see, no new process created and all tasks are finished with the original 4 processes.

Wish this useful~

linpingta
  • 2,324
  • 2
  • 18
  • 36
  • yes it is, thanks. I just wanted to make sure (also from @DanD.'s comment) that the tasks are queued up, awaiting for a free process to pick them up. With only `processes` number of such processes being alive. (?° – WoJ Sep 10 '15 at 10:36
  • I am not sure about source code, but I guess tasks are queued in structure like multiprocessing.Queue, it's a multi-producer-consumer pattern so that queue should be shared between multi processes. I think process will be alive as long as pool manages them~ – linpingta Sep 10 '15 at 10:50