python multiprocessing - select-like on running processes to see which have one have finished

Question

I want to run 15 commands but only run 3 at a time

testme.py

import multiprocessing
import time
import random
import subprocess

def popen_wrapper(i):
    p = subprocess.Popen( ['echo', 'hi'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = p.communicate()
    print stdout
    time.sleep(randomint(5,20)) #pretend it's doing some work
    return p.returncode

num_to_run = 15
max_parallel = 3

running = []
for i in range(num_to_run):
    p = multiprocessing.Process(target=popen_wrapper, args=(i,))
    running.append(p)
    p.start()

    if len(running) >= max_parallel:
        # blocking wait - join on whoever finishes first then continue
    else:
        # nonblocking wait- see if any processes is finished. If so, join the finished processes

Im not sure how to implement the comments on:

if len(running) >= max_parallel:
    # blocking wait - join on whoever finishes first then continue
else:
    # nonblocking wait- see if any processes is finished. If so, join the finished processes

I would NOT be able to do something like:

for p in running:
   p.join()

because the second process in running would have finished but im still blocked on the first one.

Quesion: how do you check to see if processes in running is finished in both blocking and nonblocking (find the first one finished)?

looking for something similiar to waitpid, maybe

Can you make a successfully finished process to send your watching process a message via a queue? — 9000, Dec 11 '15 at 19:48

unutbu · Accepted Answer · 2015-12-11T20:33:18.877

Perhaps the easiest way to arrange this is to use a multiprocessing.Pool:

pool =  mp.Pool(3)

will set up a pool with 3 worker processes. Then you can send 15 tasks to the pool:

for i in range(num_to_run):
    pool.apply_async(popen_wrapper, args=(i,), callback=log_result)

and all the machinery necessary to coordinate the 3 workers and 15 tasks is taken care of by mp.Pool.

Using mp.Pool:

import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)

def popen_wrapper(i):
    logger.warn('echo "hi"')
    return i

def log_result(retval):
    results.append(retval)

if __name__ == '__main__':

    num_to_run = 15
    max_parallel = 3
    results = []

    pool =  mp.Pool(max_parallel)
    for i in range(num_to_run):
        pool.apply_async(popen_wrapper, args=(i,), callback=log_result)
    pool.close()
    pool.join()

    logger.warn(results)

yields

[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-2] echo "hi"
[WARNING/MainProcess] [0, 2, 3, 5, 4, 6, 7, 8, 9, 10, 11, 12, 14, 13, 1]

The logging statements show which PoolWorker handles each task, and the last logging statement shows the MainProcess has received the return values from the 15 calls to popen_wrapper.

If you'd like do it without a Pool, you could set up a mp.Queue for tasks and a mp.Queue for return values:

Using mp.Process and mp.Queues:

import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)

SENTINEL = None
def popen_wrapper(inqueue, outqueue):
    for i in iter(inqueue.get, SENTINEL):
        logger.warn('echo "hi"')
        outqueue.put(i)

if __name__ == '__main__':

    num_to_run = 15
    max_parallel = 3

    inqueue = mp.Queue()
    outqueue = mp.Queue()
    procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue)) 
             for i in range(max_parallel)]

    for p in procs:
        p.start()
    for i in range(num_to_run):
        inqueue.put(i)
    for i in range(max_parallel):
        # Put sentinels in the queue to tell `popen_wrapper` to quit
        inqueue.put(SENTINEL)

    for p in procs:
        p.join()

    results = [outqueue.get() for i in range(num_to_run)]
    logger.warn(results)

Notice that if you use

procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue)) 
         for i in range(max_parallel)]

then you enforce there being exactly max_parallel (e.g. 3) worker processes. You then send all 15 tasks to one Queue:

for i in range(num_to_run):
    inqueue.put(i)

and let the worker processes pull tasks out of the queue:

def popen_wrapper(inqueue, outqueue):
    for i in iter(inqueue.get, SENTINEL):
        logger.warn('echo "hi"')
        outqueue.put(i)

You may also find Doug Hellman's multiprocessing tutorial of interest. Among the many instructive examples you'll find there is an ActivePool recipe which shows how to spawn 10 processes and yet limit them (using a mp.Semaphore) so that only 3 are active at any given time. While that may be instructive, it may not be the best solution in your situation since there doesn't appear to be a reason why you'd want to spawn more than 3 processes.

python multiprocessing - select-like on running processes to see which have one have finished

1 Answers1