0

I have a large codebase to parallelise. I can avoid rewriting the method signatures of hundreds of functions by using a single global queue. I know it's messy; please don't tell me that if I'm using globals I'm doing something wrong in this case it really is the easiest choice. The code below works but i don't understand why. I declare a global multiprocessing.Queue() but don't declare that it should be shared between processes (by passing it as a parameter to the worker). Does python automatically place this queue in shared memory? Is it safe to do this on a larger scale?

Note: You can tell that the queue is shared between the processes: the worker processes start doing work on empty queues and are idle for one second before the main queue pushes some work onto the queues.

import multiprocessing
import time

outqueue = None


class WorkerProcess(multiprocessing.Process):
    def __init__(self):
        multiprocessing.Process.__init__(self)
        self.exit = multiprocessing.Event()

    def doWork(self):
        global outqueue
        ob = outqueue.get()
        ob = ob + "!"
        print ob
        time.sleep(1) #simulate more hard work
        outqueue.put(ob)

    def run(self):
        while not self.exit.is_set():
            self.doWork()

    def shutdown(self):
        self.exit.set()

if __name__ == '__main__':
    global outqueue
    outqueue = multiprocessing.Queue()

    procs = []
    for x in range(10):
        procs.append(WorkerProcess())
        procs[x].start()

    time.sleep(1)
    for x in range(20):
        outqueue.put(str(x))

    time.sleep(10)
    for p in procs:
        p.shutdown()

    for p in procs:
        p.join()

    try:
        while True:
            x = outqueue.get(False)
            print x
    except:
        print "done"
Michiel Ariens
  • 288
  • 4
  • 12
  • I strongly urge reading the answer to [this question](http://stackoverflow.com/questions/11442892/python-multiprocessing-queue-failure) instead of the accepted answer below, which I think is completely wrong. – Ami Tavory May 31 '16 at 15:57

1 Answers1

1

Assuming you're using Linux, the answer is in the way the OS creates a new process.

When a process spawns a new one in Linux, it actually forks the parent one. The result is a child process with all the properties of the parent one. Basically a clone.

In your example you are instantiating the Queue and then creating the new processes. Therefore the children processes will have a copy of the same queue and will be able to use it.

To see things broken just try to first create the processes and then creating the Queue object. You'll see the children having the global variable still set as None while the parent will have a Queue.

It is safe, yet not recommended, to share a Queue as a global variable on Linux. On Windows, due to the different process creation approach, sharing a queue through a global variable won't work.

As mentioned in the programming guidelines

Explicitly pass resources to child processes

On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.

For more info about Linux forking you can read its man page.

Community
  • 1
  • 1
noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • My apologies for the downvote, but I think that this is incorrect. From personal experience, this can lead to situations where a process can fail to `get` messages `put` into a queue by a different process. I suggest looking at the answer to [this question](http://stackoverflow.com/questions/11442892/python-multiprocessing-queue-failure). – Ami Tavory May 31 '16 at 16:00
  • "It is safe, yet not recommended, to share a Queue as a global variable.": it seems that it doesn't work on Windows. See my question on that: http://stackoverflow.com/questions/42734348/sharing-synchronization-objects-through-global-namespace-vs-as-a-function-argume – max Mar 11 '17 at 17:52
  • @max: as said in the answer, I was focusing on Linux. I updated the answer according to your comment. – noxdafox Mar 12 '17 at 13:54
  • @AmiTavory: don't really understand the meaning of your comment. The linked question does not provide enough code to reproduce the issue. multiprocessing.Queue objects are process safe. Nevertheless, as most synchronisation primitives are easy to misuse and misunderstand. It's not the tool, is the way you use it ;) – noxdafox Mar 12 '17 at 13:54