4

Working up from threads to processes, I have switched to concurrent.futures, and would like to gain/retain flexibility in switching between a ThreadPoolExecutor and a ProcessPoolExecutor for various scenarios. However, despite the promise of a unified facade, I am having a hard time passing multiprocessing Queue objects as arguments on the futures.submit() when I switch to using a ProcessPoolExecutor:

import multiprocessing as mp
import concurrent.futures

def foo(q):
    q.put('hello')

if __name__ == '__main__':

    executor = concurrent.futures.ProcessPoolExecutor()
    q = mp.Queue()
    p = executor.submit(foo, q)
    p.result()
    print(q.get())

bumps into the following exception coming from multiprocessing's code:

RuntimeError: Queue objects should only be shared between processes through inheritance

which I believe means it doesn't like receiving the queue as an argument, but rather expects to (not in any OOP sense) "inherit it" on the multiprocessing fork rather than getting it as an argument.

The twist is that with bare-bones multiprocessing, meaning when not using it through the facade which concurrent.futures is ― there seems to be no such limitation, as the following code seamlessly works:

import multiprocessing as mp

def foo(q):
    q.put('hello')

if __name__ == '__main__':
    q = mp.Queue()
    p = mp.Process(target=foo, args=(q,))
    p.start()
    p.join()
    print(q.get())

I wonder what am I missing about this ― how can I make the ProcessPoolExecutor accept the queue as an argument when using concurrent.futures the same as it does when using the ThreadPoolExecutor or multiprocessing very directly like shown right above?

matanster
  • 15,072
  • 19
  • 88
  • 167
  • 3
    You generally can't pass a `multiprocessing.Queue` as argument _after_ a `Process` has already started, you need to pass it already to the constructor of the `Process` object. When you use some sort of multiprocessing-pool, you need the `initializer`-parameter providing a function which registers the shared object as a `global` in the worker-process and then pass the object itself as `initargs(obj)` to the pool upon pool-initialization. So in your case this would be `ProcessPoolExecutor(initializer=init_q, initargs=(q,))` with `def init_q(q): globals()['q'] = q`... – Darkonaut Aug 14 '20 at 21:43
  • 1
    ... But sharing a queue in a `Pool` at all makes little sense, since they already provide the basic plumbing under the hood. Sparing you this kind of boilerplate with setting up queues is the whole point of using it in the first place. You just need your target-function let return something and then wait on the result with `print(executor.submit(foo).result())`, that's also how you get the easy switching flexibilty you asked for. – Darkonaut Aug 14 '20 at 21:44
  • 1
    I use threads/processes as long-running "daemons" that need to communicate some state back to the parent every so often and get a message from the parent every so often, not as one-time tasks having a single result. I opt to use queues therefore for this communication as they are available for both threads and processes. And since each thread/process has its own logical role each process gets a different queue for messages from the main thread/process, which the `initializer` approach does not seem to cater for. – matanster Aug 14 '20 at 22:20
  • 1
    If each process has its own role, that's not the use-case of anonymous workers for batch-processing the stdlib-pools are made by design. As a consequence, if you use any pool-method, you have no control over which worker ends up receiving the item. I'd suggest you build your custom pool-object yourself. You're already knee-deep in doing so by trying to use additional queues within a stdlib-pool, just let the stdlib-pool aside and use `Process` or `Thread` with appropriate queues and wrap it up in class of your own making you can initialize to use threads or processes. – Darkonaut Aug 14 '20 at 23:32
  • 3
    For completeness, `multiprocessing.Manager().Queue` would be a queue, whose proxies you could pass to running threads/processes in a pool, but stuffing this into a stdlib-pool is really ugly and error-prone for various reasons. – Darkonaut Aug 14 '20 at 23:33
  • 1
    @Darkonaut thanks for your help at the various levels of this! It mirrors my own progress in the meanwhile. Using a multiprocessing manager managed Queue does solve my use case it seems, but I should make more progress with this setup before posting my own answer with confidence. You are right that I am knee deep already, so if I find my solution based on `multiprocessing.Manager().Queue` is inadequate as I go, I will probably do as you suggest. – matanster Aug 15 '20 at 12:40
  • That said, which `Process` class did you actually suggest in your second last comment? do you find that python's [`processing`](https://pypi.org/project/processing/) library to be good infrastructure for my "communicating daemons" use case? – matanster Aug 15 '20 at 12:43
  • 1
    I'm speaking of `multiprocessing.Process`, which is based on the package from Oudkerk you linked which got merged into std-lib back then. It's been in the stdlib for over a decade so you can trust that. – Darkonaut Aug 15 '20 at 13:29
  • 1
    I've formulated some of my concerns for using `multiprocessing.Manager().Queue` with pools [here](https://stackoverflow.com/a/55577761/9059420). You're going to pay an unnecessary performance penalty (extend depends on your actual workload though) for routing intra-process communication like in your threading-only setup over a separate server-process you get with manager. So if someone asks, you don't have that idea from me ;) – Darkonaut Aug 15 '20 at 13:29

0 Answers0