4

I have a simple python multiprocessing script that sets up a pool of workers that attempt to append work-output to a Manager list. The script has 3 call stacks: - main calls f1 that spawns several worker processes that call another function g1. When one attempts to debug the script (incidentally on Windows 7/64 bit/VS 2010/PyTools) the script runs into a nested process creation loop, spawning an endless number of processes. Can anyone determine why? I'm sure I am missing something very simple. Here's the problematic code: -

import multiprocessing
import logging

manager = multiprocessing.Manager()
results = manager.list()

def g1(x):
    y = x*x
    print "processing: y = %s" % y
    results.append(y)

def f1():
    logger = multiprocessing.log_to_stderr()
    logger.setLevel(multiprocessing.SUBDEBUG)

    pool = multiprocessing.Pool(processes=4)
    for (i) in range(0,15):
        pool.apply_async(g1, [i])
    pool.close()
    pool.join()

def main():
    f1()

if __name__ == "__main__":
    main()

PS: tried adding multiprocessing.freeze_support() to main to no avail.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Blair Azzopardi
  • 502
  • 8
  • 17
  • 1
    Not sure why it would cause what you are seeing, but the example using `multiprocess.Manager` in the docs creates the `Manager` in the `if __name__ == "__main__":` block and passes the managed resources to the workers as explicit parameters. Have you tried doing it that way instead? My gut feeling is that it has something to do with the unpickling process when you have the manager object created in the module scope (something like it spawns a new manager thread every time a new thread is created, including when a new manager thread is created, thus infinite recursion). – Silas Ray Aug 13 '12 at 16:19
  • I can't reproduce the issue with this code. It works correctly for me -- but I'm not using Windows. – senderle Aug 13 '12 at 16:41
  • I tried your example with CPython 2.7.3 on Linux and it works. – user1202136 Aug 13 '12 at 16:53
  • Definitely using CPython on Windows but meant to say Pytools not Pyvot. http://pytools.codeplex.com/ – Blair Azzopardi Aug 13 '12 at 17:04

1 Answers1

6

Basically, what sr2222 mentions in his comment is correct. From the multiprocessing manager docs, it says that the ____main____ module must be importable by the children. Each manager " object corresponds to a spawned child process", so each child is basically re-importing your module (you can see by adding a print statement at module scope to my fixed version!)...which leads to infinite recursion.

One solution would be to move your manager code into f1():

import multiprocessing
import logging

def g1(results, x):
    y = x*x
    print "processing: y = %s" % y
    results.append(y)

def f1():
    logger = multiprocessing.log_to_stderr()
    logger.setLevel(multiprocessing.SUBDEBUG)
    manager = multiprocessing.Manager()
    results = manager.list()
    pool = multiprocessing.Pool(processes=4)
    for (i) in range(0,15):
        pool.apply_async(g1, [results, i])
    pool.close()
    pool.join()


def main():
    f1()

if __name__ == "__main__":
    main()
Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • That's where I started. However, placing the manager and results in f1 means they're not available in g1. One gets a NameError instead. – Blair Azzopardi Aug 13 '12 at 17:07
  • @bsdz: that's why I passed `results` in (if you noticed). ...if you need `manager`, you could pass it in as well. ...or you could create a class and store these at the instance level. – Gerrat Aug 13 '12 at 17:09
  • Ah, yes I missed that. Must admit I was hoping I wouldn't have to pass it around like that but that is how it's documented. Thanks! – Blair Azzopardi Aug 13 '12 at 17:16