16

I am running a multiprocessing pool in python, where I have ~2000 tasks, being mapped to 24 workers with the pool. each task creates a file based on some data analysis and webservices.

I want to run a new task, when all the tasks in the pool were finished. how can I tell when all the processes in the pool have finished?

Dror Hilman
  • 6,837
  • 9
  • 39
  • 56

2 Answers2

19

You want to use the join method, which halts the main process thread from moving forward until all sub-processes ends:

Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs.

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    processes = []
    for i in range(10):
        p = Process(target=f, args=('bob',))
        processes.append(p)

    for p in processes:
        p.start()
        p.join()

     # only get here once all processes have finished.
     print('finished!')

EDIT:

To use join with pools

    pool = Pool(processes=4)  # start 4 worker processes
    result = pool.apply_async(f, (10,))  # do some work
    pool.close()
    pool.join()  # block at this line until all processes are done
    print("completed")
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159
  • Thanks, but I am asking about the pool method: where you let the multiprocessing to start the processes automatically. how can you do this "join" trick with the pool? – Dror Hilman May 19 '15 at 05:42
  • Ok updated answer. You just call `join()` on the `pool` instance. – Martin Konecny May 19 '15 at 05:46
  • 2
    Note that you need to call `pool.close()` or `pool.terminate()` before you can call `pool.join()`, so the example above won't actually work. Also note that using `join()` to tell when the work is done is only a viable option if you don't need to use the pool anymore afterward, since it requires closing or terminating the pool. – dano May 19 '15 at 17:34
  • how to re-open pool after pool.join() ? – machen Sep 26 '17 at 11:41
  • 2
    If you want the processes to run in parallel you need to first call start() on all the processes and then call join. – Jonatan Dec 20 '18 at 13:52
  • 4
    This accepted answer is **NOT** running in parallel, therefore is not a valid answer. – José L. Patiño Apr 06 '20 at 00:13
  • Each of the `apply_async` runs a pool on one process only!! So this answer is **NOT*** running in parallel! – Yahya Jan 23 '21 at 18:03
0

You can use the wait() method of the ApplyResult object (which is what pool.apply_async returns).

import multiprocessing

def create_file(i):
    open(f'{i}.txt', 'a').close()

if __name__ == '__main__':
    # The default for n_processes is the detected number of CPUs
    with multiprocessing.Pool() as pool:

        # Launch the first round of tasks, building a list of ApplyResult objects
        results = [pool.apply_async(create_file, (i,)) for i in range(50)]
    
        # Wait for every task to finish
        [result.wait() for result in results]

        # {start your next task... the pool is still available}

    # {when you reach here, the pool is closed}

This method works even if you're planning on using your pool again and don't want to close it--as an example, you might want to keep it around for the next iteration of your algorithm. Use a with statement or call pool.close() manually when you're done using it, or bad things will happen.