11

Here is a complete simple working example

import multiprocessing as mp
import time
import random


class Foo:
    def __init__(self):
        # some expensive set up function in the real code
        self.x = 2
        print('initializing')

    def run(self, y):
        time.sleep(random.random() / 10.)
        return self.x + y


def f(y):
    foo = Foo()
    return foo.run(y)


def main():
    pool = mp.Pool(4)
    for result in pool.map(f, range(10)):
        print(result)
    pool.close()
    pool.join()


if __name__ == '__main__':
    main()

How can I modify it so Foo is only initialized once by each worker, not every task? Basically I want the init called 4 times, not 10. I am using python 3.5

Cœur
  • 37,241
  • 25
  • 195
  • 267
user2133814
  • 2,431
  • 1
  • 24
  • 34

2 Answers2

15

The intended way to deal with things like this is via the optional initializer and initargs arguments to the Pool() constructor. They exist precisely to give you a way to do stuff exactly once when a worker process is created. So, e.g., add:

def init():
    global foo
    foo = Foo()

and change the Pool creation to:

pool = mp.Pool(4, initializer=init)

If you needed to pass arguments to your per-process initialization function, then you'd also add an appropriate initargs=... argument.

Note: of course you should also remove the

foo = Foo()

line from f(), so that your function uses the global foo created by init().

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • Can you please explain the global keyword in this context. I saw the initializer in the docs but didn't think/know about "global" so I didn't see how to make it work. Thanks – user2133814 Aug 05 '16 at 19:04
  • See my edit just now: you also need to _use_ the `foo` created by the initialization function. The initialization function runs once and ends, so any change it makes has to be visible at global scope so that other functions later (such as invoked by `map()`) can benefit. – Tim Peters Aug 05 '16 at 19:06
  • 2
    No. Nothing is shared across processes. `global` has been in Python since day 1, many years before `multiprocessing` was even an idea for a module. `global` has nothing to do with processes (or threads). In context, it's simply telling `init()` to bind `foo` in the module's global scope instead of in (the default) `init`'s local scope. In multiprocessing, each process has its own, distinct module global namespace. – Tim Peters Aug 05 '16 at 19:17
3

most obvious, lazy load

_foo = None
def f(y):
    global _foo
    if not _foo:
       _foo = Foo()
    return _foo.run(y)
ykhrustalev
  • 604
  • 10
  • 18