0

I like the default python multiprocessing.Pool, but it's still a pain that it isn't easy to show the current progress being made during the pool's execution. In leui of that, I attempted to create my own, custom multiprocess pool mapper, and it looks like this;

from multiprocessing import Process, Pool, cpu_count
from iterable_queue import IterableQueue

def _proc_action(f, in_queue, out_queue):
    try:
        for val in in_queue:
            out_queue.put(f(val))
    except (KeyboardInterrupt, EOFError):
        pass

def progress_pool_map(f, ls, n_procs=cpu_count()):
    in_queue = IterableQueue()
    out_queue = IterableQueue()
    err = None
    try:
        procs = [Process(target=_proc_action, args=(f, in_queue, out_queue)) for _ in range(n_procs)]
        [p.start() for p in procs]
        for elem in ls:
            in_queue.put(elem)
        in_queue.close()
        bar = 0
        for _ in ls:
            elem = next(out_queue)
            bar += 1
            if bar % 1000 == 0:
                print(bar)
            yield elem
        out_queue.close()
    except (KeyboardInterrupt, EOFError) as e:
        in_queue.close()
        out_queue.close()
        print("Joining processes")
        [p.join() for p in procs]
        print("Closing processes")
        [p.close() for p in procs]
        err = e
    if err:
        raise err

It works fairly well, and prints a value to the console for every 1000 items processed. The progress display itself is something I can worry about in future. Right now, however, my issue is that when cancelled, the operation does anything but fail gracefully. When I try to interrupt the map, it hangs on Joining Processes, and never makes it to Closing Processes. If I try hitting Ctrl+C again, it causes an infinite spew of BrokenPipeErrors to fill the console until I send an EOF and stop my program.

Here's iterable_queue.py, for reference;

from multiprocessing.queues import Queue
from multiprocessing import get_context, Value
import queue

class QueueClosed(Exception):
    pass

class IterableQueue(Queue):
    def __init__(self, maxsize=0, *, ctx=None):
        super().__init__(
            maxsize=maxsize,
            ctx=ctx if ctx is not None else get_context()
        )
        self.closed = Value('b', False)

    def close(self):
        with self.closed.get_lock():
            if not self.closed.value:
                self.closed.value = True
                super().put((None, False))
                # throws BrokenPipeError in another thread without this sleep in between
                # terrible hack, must fix at some point
                import time; time.sleep(0.01)
                super().close()

    def __iter__(self):
        return self

    def __next__(self):
        try:
            return self.get()
        except QueueClosed:
            raise StopIteration

    def get(self, *args, **kwargs):
        try:
            result, is_open = super().get(*args, **kwargs)
        except OSError:
            raise QueueClosed
        if not is_open:
            super().put((None, False))
            raise QueueClosed
        return result

    def __bool__(self):
        return bool(self.closed.value)

    def put(self, val, *args, **kwargs):
        with self.closed.get_lock():
            if self.closed.value:
                raise QueueClosed
            super().put((val, True), *args, **kwargs)

    def get_nowait(self):
        return self.get(block=False)

    def put_nowait(self):
        return self.put(block=False)

    def empty_remaining(self, block=False):
        try:
            while True:
                yield self.get(block=block)
        except (queue.Empty, QueueClosed):
            pass

    def clear(self):
        for _ in self.empty_remaining():
            pass

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()
Maurdekye
  • 3,597
  • 5
  • 26
  • 43
  • if you just want progress, why not use `imap` (or even `imap_unordered`) and print progress from the results loop? note `tqdm` can be nice for progress bars – Sam Mason Aug 13 '19 at 15:09
  • You're assuming I know what `imap` is. – Maurdekye Aug 13 '19 at 15:12
  • sorry, I meant [`imap` from `multiprocessing.Pool`](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.imap) – Sam Mason Aug 13 '19 at 15:14
  • 1
    So `multiprocessing.imap` returns a generator that lazily maps the elements? – Maurdekye Aug 13 '19 at 15:17
  • yes, note that if the function runs quickly then pickling/unpickling individual objects can take significant proportion of time and you want to make use of the `chunksize` parameter. you don't need this for `map` as it knows the length of the list and can calculate a sensible value – Sam Mason Aug 13 '19 at 15:20
  • How does it calculate that value, by the way? Since I know the size in advance. – Maurdekye Aug 13 '19 at 15:24
  • This solution seems to work nicely, and while it does exit more cleanly than my custom solution, it still doesn't exit completely clean. Each pool worker receives a `KeyboardInterrupt` which I can't catch, and the stacktrace of which is printed to the console. – Maurdekye Aug 13 '19 at 15:26
  • the source is https://github.com/python/cpython/blob/3.7/Lib/multiprocessing/pool.py#L385 – Sam Mason Aug 13 '19 at 15:26
  • you should be able to catch `KeyboardInterrupt`, but note that it isn't derived from `Exception` but `BaseException` so you might need to change your handling – Sam Mason Aug 13 '19 at 15:28
  • I am already catching the `KeyboardInterrupt` from the main thread, but the error is still left uncaught in the child pool worker threads. – Maurdekye Aug 13 '19 at 15:32
  • ah, so it just takes the length of the list and divides it by the number of pool workers times 4. Simple enough. – Maurdekye Aug 13 '19 at 15:33
  • workers are their own processes, you'd need to catch exceptions there as well – Sam Mason Aug 13 '19 at 15:34
  • Since I don't have any control over the worker threads, I can't actually put a `try-except` clause in them. – Maurdekye Aug 13 '19 at 16:14

0 Answers0