ThreadPoolExecutor with stateful workers

Question

I'm working with a Backend class which spawns a subprocess to perform the CPU-bound work. I have no control over that class and basically the only way of interaction is to create an instance backend = Backend() and submit work via backend.run(data) (this in turn submits the work to the subprocess and blocks until completion). Because these computations take quite some time, I'd like to perform them in parallel. Since the Backend class already spawns its own subprocess to perform the actual work, this appears to be an IO-bound situation.

So I thought about using multiple threads, each of which uses its own Backend instance. I could create these threads manually and connect them via queues. The following is an example implementation with some Backend mock class:

import os
import pty
from queue import Queue
from subprocess import PIPE, Popen
from threading import Thread


class Backend:
    def __init__(self):
        f, g = pty.openpty()
        self.process = Popen(
            ['bash'],  # example program
            text=True, bufsize=1, stdin=PIPE, stdout=g)
        self.write = self.process.stdin.write
        self.read = os.fdopen(f).readline

    def __enter__(self):
        self.write('sleep 2\n')  # startup work
        return self

    def __exit__(self, *exc):
        self.process.stdin.close()
        self.process.kill()

    def run(self, x):
        self.write(f'sleep {x} && echo "ok"\n')  # perform work
        return self.read().strip()


class Worker(Thread):
    def __init__(self, inq, outq, **kwargs):
        super().__init__(**kwargs)
        self.inq = inq
        self.outq = outq

    def run(self):
        with Backend() as backend:
            while True:
                data = self.inq.get()
                result = backend.run(data)
                self.outq.put((data, result))


task_queue = Queue()
result_queue = Queue()

n_workers = 3
threads = [Worker(task_queue, result_queue, daemon=True) for _ in range(n_workers)]
for thread in threads:
    thread.start()

data = [2]*7
for x in data:
    task_queue.put(x)

for _ in data:
    print(f'Result ready: {result_queue.get()}')

Since the Backend needs to perform some work at startup, I don't want to create a new instance for each task. Hence each Worker creates one Backend instance for its whole life cycle. It's also important that each of the workers has its own backend, so they won't interfere with each other.

Now here's the question: Can I also use concurrent.futures.ThreadPoolExecutor to accomplish this? It looks like the Executor.map method would be the right candidate, but I can't figure out how to ensure that each worker receives its own instance of Backend (which needs to be persistent between tasks).

@EricBurel Can you elaborate on what you mean by *"it doesn't guarantee that further calls to the pull will update each process state (say that only process 1 and 2 are called)"*? — aaron, Jul 15 '23 at 10:53
Say each process has a chunk of data, you want to filter those data: you need to transmit a "filter" call to each process. But that's just not doable with executors, you have to manage a process pool by hand in this scenario, and if you need to call them in a coroutine you can either use "aiopipe" or multiprocessing queues. — Eric Burel, Jul 17 '23 at 07:10

score 3 · Answer 1 · answered Feb 22 '21 at 08:25

The state of worker threads can be saved in the global namespace, e.g. as a dict. Then threading.current_thread can be used to save/load the state for each of the workers. contextlib.ExitStack can be used to handle Backend appropriately as a context manager.

from concurrent.futures import ThreadPoolExecutor
from contextlib import ExitStack
import os
import pty
from subprocess import PIPE, Popen
import threading


class Backend:
    ...


backends = {}
exit_stack = ExitStack()


def init_backend():
    backends[threading.current_thread()] = exit_stack.enter_context(Backend())


def compute(data):
    return data, backends[threading.current_thread()].run(data)


with exit_stack:
    with ThreadPoolExecutor(max_workers=3, initializer=init_backend) as executor:
        for result in executor.map(compute, [2]*7):
            print(f'Result ready: {result}')

I guess this is not applicable if the worker state is so big that it doesn't fit into a single process memory (for instance, 3 dataframes built out a very big file)? — Eric Burel, Jul 11 '23 at 14:20
@EricBurel What "worker state" are you talking about here? This question specifically discusses threaded workers where the only state is the `Backend` instance(s). If you are talking about `data`, then threaded workers wouldn't be used in the first place — in that case, that would be a separate question about how to load `data` as needed within each `compute` call. — aaron, Jul 15 '23 at 11:13

score 3 · Answer 2 · answered Jul 14 '23 at 02:03

Building on the existing answer from a_guest, you can avoid the usage of globals by leveraging Thread-Local Data with threading.local.

Starting with an example class of:

class Backend:
    def __enter__(self):
        logging.info("Initializing")
        return self

    def __exit__(self, *args):
        logging.info("Exiting")

    def work(self, val):
        logging.info(f"Running {val}")
        return val

We can create a thread local wrapper for it like so:

class BackendContainer:
    def __init__(self):
        self._stack = ExitStack()
        self._locals = threading.local()

    def __enter__(self):
        return self

    def initialize(self):
        self._locals.backend = self._stack.enter_context(Backend())

    def __exit__(self, *args):
        self._stack.close()

    def work(self, val):
        return self._locals.backend.work(val)

Usage:

logging.basicConfig(level=logging.INFO, format="%(threadName)s: %(message)s")

with BackendContainer() as b, ThreadPoolExecutor(3, initializer=b.initialize) as ex:
    for val in ex.map(b.work, range(8)):
        pass

Output:

ThreadPoolExecutor-0_0: Initializing
ThreadPoolExecutor-0_0: Running 0
ThreadPoolExecutor-0_1: Initializing
ThreadPoolExecutor-0_0: Running 1
ThreadPoolExecutor-0_1: Running 2
ThreadPoolExecutor-0_2: Initializing
ThreadPoolExecutor-0_0: Running 3
ThreadPoolExecutor-0_1: Running 4
ThreadPoolExecutor-0_2: Running 5
ThreadPoolExecutor-0_0: Running 6
ThreadPoolExecutor-0_1: Running 7
MainThread: Exiting
MainThread: Exiting
MainThread: Exiting

You could also choose to avoid the initializer method by making the access to backend lazily created:

class BackendContainer:
    def __init__(self):
        self._stack = ExitStack()
        self._locals = threading.local()

    def __enter__(self):
        return self

    @property
    def backend(self):
        if not hasattr(self._locals, "backend"):
            self._locals.backend = self._stack.enter_context(Backend())
        return self._locals.backend

    def __exit__(self, *args):
        self._stack.close()

    def work(self, val):
        return self.backend.work(val)


logging.basicConfig(level=logging.INFO, format="%(threadName)s: %(message)s")

with BackendContainer() as b, ThreadPoolExecutor(3) as ex:
    for val in ex.map(b.work, range(8)):
        pass

Thanks for you answer. Are the stack/context stuff common in Python? Can you please perhaps elaborate a bit on that? — Eric Burel, Jul 17 '23 at 07:10
@EricBurel Exitstack and locals are both common and part of the standard libraries (they ship with the Python interpreter itself, so they are always available). You should be able to see the official docs for all of the nitty gritty details. At a high level, `ExitStack` allows you to more easily call the `__enter__` and `__exit__` methods on context managers without the use of `with`. They are useful for cases with a dynamic amount of context managers (e.g. the number of threads). Thread `locals` are mappings that are unique to a single thread. This avoids threads clobbering eachother. — flakes, Jul 17 '23 at 15:13

ThreadPoolExecutor with stateful workers

2 Answers2