4

I'm writing a script that will use python's multiprocessing and threading module. For your understanding, I spawn as much processes as cores are available and inside each process I start e.g. 25 threads. Each thread consumes from an input_queue and produces to an output_queue. For the queue object I use multiprocessing.Queue.

After my first tests I got a deadlock because the the thread responsible to feed and flush the Queue was hanging. After a while I found that I can use Queue().cancel_join_thread() to work around this problem.

But because of the possibility of data loss I would like to use: multiprocessing.Manager().Queue()

Now actual question: Is it better to use one manager object for each queue? Or should I create one manager and get two quese from the same manager object?

# One manager for all queues
import multiprocessing

manager = multiprocessing.Manager()
input_queue = manager.Queue()
output_queue = manager.Queue()

...Magic...

# As much managers as queues
manager_in = multiprocessing.Manager()
queue_in = manager_in.Queue()

manager_out = multiprocessing.Manager()
queue_out = manager_out.Queue()

...Magic...

Thank you for your help.

dvonessen
  • 177
  • 1
  • 8

1 Answers1

7

There is no need to use two separate Manager objects. As you have already seen the Manager object allows sharing objects among multiple processes; from the docs:

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

Therefore if you have two different queues you can still use the same manager. In case it helps someone, here is a simple example using two queues with one manager:

from multiprocessing import Manager, Process
import time


class Worker(Process):
    """
    Simple worker.
    """

     def __init__(self, name, in_queue, out_queue):
        super(Worker, self).__init__()
        self.name = name
        self.in_queue = in_queue
        self.out_queue = out_queue

    def run(self):
        while True:
            # grab work; do something to it (+1); then put the result on the output queue
            work = self.in_queue.get()
            print("{} got {}".format(self.name, work))
            work += 1

            # sleep to allow the other workers a chance (b/c the work action is too simple)
            time.sleep(1)

            # put the transformed work on the queue
            print("{} puts {}".format(self.name, work))
            self.out_queue.put(work)


if __name__ == "__main__":
    # construct the queues
    manager = Manager()
    inq = manager.Queue()
    outq = manager.Queue()

    # construct the workers
    workers = [Worker(str(name), inq, outq) for name in range(3)]
    for worker in workers:
        worker.start()

    # add data to the queue for processing
    work_len = 10
    for x in range(work_len):
        inq.put(x)

    while outq.qsize() != work_len:
        # waiting for workers to finish
        print("Waiting for workers. Out queue size {}".format(outq.qsize()))
        time.sleep(1)

    # clean up
    for worker in workers:
        worker.terminate()

    # print the outputs
    while not outq.empty():
        print(outq.get())

Using two managers instead like so:

# construct the queues
manager1 = Manager()
inq = manager1.Queue()
manager2 = Manager()
outq = manager2.Queue()

works but there is no need.

Paul
  • 5,473
  • 1
  • 30
  • 37
  • Thank you. That's exactly how I implemented the manager and it's queues. – dvonessen Jun 19 '18 at 19:40
  • @Paul I added a line `print("pid: {}".format(os.getpid()))` in the `__init__`. It turns out all the processes created as such have the same pid, hence share the same memory? How can you create different processes with the same queuing mechanism? I tried: `workers = [Process(target = Worker, args = ((str(name), inq, outq))) for name in range(3)]` but the worker does not seem to receive the messages on 'inq'. Any help appreciated! Thanks! – Jimmy Nov 04 '18 at 13:26
  • 1
    @Jimmy try putting that same `print` into the `run(..)` method of the Worker processes. You'll see that that code is running in a different process so they are indeed different processes. I can't find a good link right now but while the `__init__` is run on the main process there is another process which is `forked/spawned` and that is where the `run` is going to be executing. – Paul Nov 05 '18 at 14:12
  • @Paul Interesting. Thanks for the reply. – Jimmy Nov 07 '18 at 10:10