0

I suspect that something like:

@memoize
def foo():
    return something_expensive

def main():
    with ProcessPoolExecutor(10) as pool:
        futures = {pool.submit(foo, arg): arg for arg in args}
        for future in concurrent.futures.as_completed(futures):
            arg = futures[future]
            try:
                result = future.result()
            except Exception as e:
                sys.stderr.write("Failed to run foo() on {}\nGot {}\n".format(arg, e))
            else:
                print(result)

Won't work (assuming @memoize is a typical dict-based cache) due to the fact that I am using a multi-processing pool and the processes don't share much. At least it doesn't seem to work.

What is the correct way to memoize in this scenario? Ultimately I'd also like to pickle the cache to disk and load it on subsequent runs.

martineau
  • 119,623
  • 25
  • 170
  • 301
GL2014
  • 6,016
  • 4
  • 15
  • 22
  • 1
    have you looked up anything? there are many questions on [sharing state](https://stackoverflow.com/questions/30264699/shared-state-in-multiprocessing-processes) using `multiprocessing`, or have you tried any of the approaches mentioned in the [docs](https://docs.python.org/3.7/library/multiprocessing.html#sharing-state-between-processes)? – juanpa.arrivillaga Jul 23 '19 at 00:46
  • Since memorizing requires storing arguments and results of earlier calls — however it's implemented — it's very unlikely to work when the memorized function is being used in more than one process, since the once has its own separate memory-space. It will benefit multiple calls _within_ each process, be the overall benefit will likely be reduced. That said, your pickling idea sounds like it might be worth pursing, IMO, as the pickled data _could_ be used by more than one process if each one loads it. – martineau Jul 23 '19 at 01:12

1 Answers1

0

You can use a Manager.dict from multiprocessing which uses a Manager to proxy between processes and store in a shared dict, which can be pickled. I decided to use Multithreading though because it's an IO bound app and thread shared memory space means I dont need all that manager stuff, I can just use a dict.

GL2014
  • 6,016
  • 4
  • 15
  • 22