1

The following code deadlocks instead of exiting when SystemExit is raised in the asyncio task.

import asyncio
import multiprocessing as mp

def worker_main(pipe):
    try:
        print("Worker started")
        pipe.recv()
    finally:
        print("Worker exiting.")

async def main():
    _, other_end = mp.Pipe()
    worker = mp.Process(target=worker_main, args=(other_end,))
    worker.daemon = False # (default), True causes multiprocessing explicitly to terminate worker
    worker.start()
    await asyncio.sleep(2)
    print("Main task raising SystemExit")
    raise SystemExit

if __name__ == "__main__":
    asyncio.get_event_loop().run_until_complete(main())

It prints just the following and then hangs:

Worker started
Main task raising SystemExit

If I then press Ctrl+C I get a traceback indicating it was waiting for p.join() in multiprocessing.util._exit_function().

This is with Python 3.9.5 on Windows 10.

Superficially the reason seems to be as follows:

When SystemExit is raised in the asyncio task it appears that all tasks are ended and then SystemExit is re-raised outside the event loop. The multiprocessing library has registered multiprocessing.util._exit_function() with atexit.register(), and it gets called as the main process exits. This executes p.join() on the worker process. Crucially, this happens before the pipe is closed, so it deadlocks with the worker waiting for the pipe to be closed (EOFError) while the main process waits for the worker to exit.

The solution/workaround seems to be to make the worker a daemon process so that _exit_function() will explicitly terminate it before p.join(), which breaks the deadlock. The only problem with this is that it prevents the worker doing any clean-up before it exits.

The same problem does not occur in a non-asyncio application, and I'm not sure why it should be different. If the application is non-asyncio, then as the main process exits the pipe to the worker is broken and the worker exits as expected with an EOFError. I've also confirmed that if the asyncio task is allowed to exit normally but then there is a raise SystemExit after run_until_complete() returns then it behaves as for the non-asyncio case - i.e. it exits properly.

Is this a bug in Python, or is it expected to behave this way?

Ian Goldby
  • 5,609
  • 1
  • 45
  • 81

1 Answers1

1

Python3.10.0 + Win10 has the same behavior.

Is this a bug in Python, or is it expected to behave this way?

Good question. In my opinion this is not a bug in Python. You have an asyncio program with a single task and a single thread. Once you start the event loop, all code in the main thread runs in the event loop. When it executes p.join(), it's a blocking call. Since there is no second thread that can cause an unblock, the program hangs at that point.

It's my understanding that each task in an asyncio program handles its own exceptions before (possibly) propagating them, typically to the asyncio.run() call. (At least that's my experience, and it's hard to see how it could be otherwise.) So the exception handling environment is different between sync and async programs.

Second point: there is no way that Python can possibly know how to gracefully terminate a non-daemon Process. If you designate a Process as a daemon, you are essentially telling Python that the Process can safely be terminated. As you point out, however, that will bypass any clean up required.

Since you have a non-daemon Process, it is up to your code to perform cleanup.

Third point: if you replace this line:

raise SystemExit

with this one:

raise Exception

the exact same problem occurs, and for the same reason. The program won't exit because there is a non-daemon Process running. The issue is not just clean-up from SystemExit but from any Exception. If you solve that problem, your question becomes moot.

You could explicitly catch SystemExit and Exception in your main function, like so:

import asyncio
import multiprocessing as mp

def worker_main(pipe):
    try:
        print("Worker started")
        pipe.recv()
    finally:
        print("Worker exiting.")

async def main():
    my_end, other_end = mp.Pipe()
    worker = mp.Process(target=worker_main, args=(other_end,))
    try:
        worker.daemon = False 
        worker.start()
        await asyncio.sleep(2)
        print("Main task raising SystemExit")
        raise SystemExit
        # raise Exception
    except (SystemExit, Exception):
        my_end.send("Bye")
        worker.join()
        raise

if __name__ == "__main__":
    asyncio.run(main())

You can see the difference between SystemExit and Exception by commenting out one or the other. Both exit gracefully. SystemExit doesn't print a traceback to the console; Exception does.

Paul Cornelius
  • 9,245
  • 1
  • 15
  • 24
  • Excellent points, but perhaps tangental to what surprised me. The worker exits when the pipe is broken, and this evidently happens *before* `p.join()` in the non-asyncio case, or else the non-asyncio case doesn't call `p.join()` - I'm not sure which, though I expected the breakage of the pipe to happen as a consequence of the OS cleaning up the main process. In the asyncio case `p.join()` is called *after* the event loop has ended - so why is the behaviour different from the non-asyncio case? – Ian Goldby Feb 24 '22 at 10:55
  • Obviously my example is a minimal working example. In a real application the main process would use a finally block or context manager to gracefully close the pipe and end the worker before exiting. SystemExit skips these, which is the source of the problem. But it seems necessary because I don't know any other way to reliably abort the entire process from a fire-and-forget task (where there's no chance to 'gather' the task result). – Ian Goldby Feb 24 '22 at 11:02
  • I see that `multiprocessing` is specifically designed to manage child processes that do not outlive the parent. `subprocess.Popen()` is an alternative that would work for my use case. Then the worker would exit (due to the broken pipe) *after* the main process has been cleaned up by the OS as I originally expected of the multiprocessing version (but now realise was a wrong understanding). This doesn't answer the question of why the behaviour difference between asyncio and non-asyncio though. – Ian Goldby Feb 24 '22 at 11:30