4

I want to execute some processes in parallel and wait until they finish. So I wrote this code:

pool = mp.Pool(5)
for a in table:
    pool.apply(func, args = (some_args))
pool.close()
pool.join()

Will I get 5 processes executing func in parallel here? Or the only option is apply_async?

mnowotka
  • 16,430
  • 18
  • 88
  • 134

2 Answers2

3

The docs are quite clear on this: each call to apply blocks until the result is ready. Use apply_async.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
  • 1
    So why do I need pool? Why do I need multiprocessing at all? With apply_ascync you have to specify callback. I want something that executes process in parallel, waits for them all, and then does something else. In particular I have table of tables of tasks. I want to iterate through the table execute each chunk of processes, wait for this chunk to finish, and then execute next chunk of processes in parallel etc. With callback this is extremely compilcated... – mnowotka Jun 26 '13 at 11:31
  • @mnowotka No, the callback is optional. You can access the return value of `func` either via the callback or by calling `get` on the object returned by `apply_async`. In your example you don't use the return value at all. In that case you could use `apply_async` without further changes. – Janne Karila Jun 26 '13 at 11:54
  • No, callback is also to get information that all the processes finished. Even if I don't care about return values In order to create another chunk of process I need to know that the previous chunk has finished. Implementing this with callbacks is real pain, that's why javascript has many implementations of promises when you can say when(task1, task2).then(task3). – mnowotka Jun 26 '13 at 12:13
  • @mnowotka You could consider `mp.map` instead. – Janne Karila Jun 26 '13 at 12:29
  • Does map execute process in parallel? If so why is there map_async? – mnowotka Jun 26 '13 at 12:32
  • 1
    @mnowotka `map(...)` waits for the result, equivalent to `map_async(...).get()`. Unlike `apply`, `map` is able to initiate parallel work when it calls the function multiple times (for each item in the iterable argument). – Janne Karila Jun 26 '13 at 12:36
2

Another solution is to use Pool.imap_unordered()

The following code starts a pool of 5 workers. It then sends three jobs to the pool. The first one is num=1, second num=2, etc. The function imap_unordered means that when the first result appears, from any worker, return it for further processing. In this case, the loop prints results as they appear, which isn't in any specific order.

import multiprocessing

def calc(num):
    return num*2

pool = multiprocessing.Pool(5)
for output in pool.imap_unordered(calc, [1,2,3]):
    print 'output:',output
johntellsall
  • 14,394
  • 4
  • 46
  • 40