14

We are using the ProcessPoolExecutor from concurrent.futures in a service that asynchronously receives requests, and does the actual, synchronous processing in the process pool.

Once we ran into the case that the process pool was exhausted, so new requests had to wait until some other processes were finished.

Is there a way to interrogate the process pool for its current usage? That would allow us to monitor their state and do proper capacity planning.

If there isn't, is there any good alternative process pool implementation with an asynchronous interface that supports such monitoring/capacity planning?

moritz
  • 12,710
  • 1
  • 41
  • 63
  • 3
    Simply look at the length of the work queue `ProcessPoolExecutor._pending_work_items`. If it's greater than zero, you have work items waiting. – fpbhb Feb 04 '18 at 09:35
  • @fpbhb that's a private attribute, which is good reason not to use it, and it's also a binary signal, so not good for preventive measures. So, thanks, but I had hoped for something better. – moritz Feb 04 '18 at 10:00
  • This is Python, isn't it? But that aside: what do you want to achieve? Adjust the number of workers dynamically to *never* have a job waiting? That is neither supported by `concurrent.futures` nor by `multiprocessing.pool`. It's also kind of pointless, as once your hardware resources are exhausted something *has* to wait. – fpbhb Feb 04 '18 at 11:10
  • @fpbhb add more (virtual) hardware, for example – moritz Feb 04 '18 at 20:34
  • for that you’d need a pool of processes spread over more than one machine. That cannot be done using the process pool you’re using now. You’ll need a networked mechanism to distribute load, eg AMQP or similar. – fpbhb Feb 04 '18 at 20:43
  • @fpbhb there's already AMQP involved, and the service already runs on two machines. – moritz Feb 05 '18 at 09:38
  • set the number of workers so that they saturate the node when there is work to do; if not, they’ll just sit around with near zero overhead – fpbhb Feb 05 '18 at 17:41
  • @fpbhb Thanks. I can't find this information `ProcessPoolExecutor._pending_work_items` any other place. It is helpful to know how many items pending in queue. – Ben L Feb 07 '23 at 22:35

2 Answers2

14

The simplest way would be to extend ProcessPoolExecutor with desired behaviour. The example below maintains stdlib interface and does not access implementation details:

from concurrent.futures import ProcessPoolExecutor


class MyProcessPoolExecutor(ProcessPoolExecutor):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._running_workers = 0

    def submit(self, *args, **kwargs):
        future = super().submit(*args, **kwargs)
        self._running_workers += 1
        future.add_done_callback(self._worker_is_done)
        return future

    def _worker_is_done(self, future):
        self._running_workers -= 1

    def get_pool_usage(self):
        return self._running_workers
moritz
  • 12,710
  • 1
  • 41
  • 63
void
  • 2,759
  • 12
  • 28
  • 1
    I had to fix the argument list for `_worker_is_done` (already fixed in the text above), it receives the future as an argument, so needs one more parameter besides `self`. Now it works, thanks! – moritz Feb 12 '18 at 14:58
2

I have recently solved this question for myself in a slightly different way. Simplified, here’s what I did:

  • I keep track of pending futures externally in a set that is defined in the scope of my main loop.
  • I attach a callback to each future, and this callback is a closure over the set of futures, allowing it to remove the future from the set when done.

So, given that done() is the actual callback function, defined elsewhere, the following is defined in the scope of my main loop:

bag = set()

def make_callback(b):

    def callback(f):
        nonlocal b
        b.remove(f)
        done(f)

    return callback

For each future f which I submit to the ProcessPoolExecutor, I add the callback:

f.add_done_callback(make_callback(bag))

At any time, it’s possible to see a list of pending and running futures by looking at the contents of bag, optionally filtered by the result of the future’s running() method. E.g.:

print(*bag, sep='\n')
print('running:', *(f for f in bag if f.running()))

For many straightforward use cases, a module-level set variable would probably work just as well as the closure.

wjv
  • 2,288
  • 1
  • 18
  • 21