12

I am not sure when to use pool of workers vs multiple processes.

processes = []

for m in range(1,5):
       p = Process(target=some_function)
       p.start()
       processes.append(p)

for p in processes:
       p.join()

vs

if __name__ == '__main__':
    # start 4 worker processes
    with Pool(processes=4) as pool:
        pool_outputs = pool.map(another_function, inputs)
Pynchia
  • 10,996
  • 5
  • 34
  • 43
whiteSkar
  • 1,614
  • 2
  • 17
  • 30

2 Answers2

7

As it says on PYMOTW:

The Pool class can be used to manage a fixed number of workers for simple cases where the work to be done can be broken up and distributed between workers independently.

The return values from the jobs are collected and returned as a list.

The pool arguments include the number of processes and a function to run when starting the task process (invoked once per child).

Please have a look at the examples given there to better understand its application, functionalities and parameters.

Basically the Pool is a helper, easing the management of the processes (workers) in those cases where all they need to do is consume common input data, process it in parallel and produce a joint output.

The Pool does quite a few things that otherwise you should code yourself (not too hard, but still, it's convenient to find a pre-cooked solution)

i.e.

  • the splitting of the input data
  • the target process function is simplified: it can be designed to expect one input element only. The Pool is going to call it providing each element from the subset allocated to that worker
  • waiting for the workers to finish their job (i.e. joining the processes)
  • ...
  • merging the output of each worker to produce the final output
Pynchia
  • 10,996
  • 5
  • 34
  • 43
  • 2
    tl;dr version: use Pool for an easy implementation of data parallelism. Not generally applicable for task parallelism. – RobertB Oct 30 '15 at 22:58
7

Below information might help you understanding the difference between Pool and Process in Python multiprocessing class:

Pool:

  1. When you have junk of data, you can use Pool class.
  2. Only the process under executions are kept in the memory.
  3. I/O operation: It waits till the I/O operation is completed & does not schedule another process. This might increase the execution time.
  4. Uses FIFO scheduler.

Process:

  1. When you have a small data or functions and less repetitive tasks to do.
  2. It puts all the process in the memory. Hence in the larger task, it might cause to loss of memory.
  3. I/O operation: The process class suspends the process executing I/O operations and schedule another process parallel.
  4. Uses FIFO scheduler.
ANK
  • 537
  • 7
  • 12