0

I'm attempting to debug performance of a running production python web service, built on top of tornado using uvloop as the asyncio event loop. In trying to improve the concurrency, I'm looking to find places where synchronous code is being executed that blocks the event loop. To this end, I've configured by service with debug and a slow_callback_duration as such:

    uvloop.install()
    loop = asyncio.get_event_loop()
    loop.set_debug(True)
    loop.slow_callback_duration = 0.05
    server()

With this code enabled, I see a large number of slow callback log lines like the following:

[W 210217 18:12:24.052 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad2e3cd6a8> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.105 seconds
[W 210217 18:12:24.164 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad4d14f768> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.060 seconds
[W 210217 18:12:24.255 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad705e2618> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.083 seconds
[W 210217 18:12:24.387 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad9b08dd68> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.077 seconds
[W 210217 18:12:24.682 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad3716f4c8> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.058 seconds
[W 210217 18:12:24.899 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad9b08d528> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.191 seconds
[W 210217 18:12:24.970 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad3706ad68> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.063 seconds
[W 210217 18:12:25.117 asyncio:199] Executing <Handle <TaskWakeupMethWrapper object at 0x7fad2e57c3a8> created at /usr/local/lib/python3.7/asyncio/tasks.py:711> took 0.088 seconds

In my specific python build, that /tasks.py:711 line refers to this call inside the __done__callback of gather:

            if outer._cancel_requested:
                # If gather is being cancelled we must propagate the
                # cancellation regardless of *return_exceptions* argument.
                # See issue 32684.
                outer.set_exception(futures.CancelledError())
            else:
                outer.set_result(results) <--- This line

I'd like to understand better what this debug log line actually represents. From my (limited) understanding of the slow callback behavior it seems like the actual execution of set_result on the wrapping future for gather is taking a long time to execute, but this seems a little bit crazy. Is there a way to get more detail from this debug log and / or determine what code the event loop is actually running when it reports that slow callback?

  • Is there a way to reproduce this without tornado/uvloop? Or with those, but with a simple (and self-contained) program? Providing a runnable program that reproduces the issue would help with diagnosing the problem. AFAIK `set_result` should *not* affect execution time of the callback because it doesn't execute callbacks, it just enqueues them for execution in the next event loop iteration, as if through `call_soon`. – user4815162342 Feb 18 '21 at 14:07
  • Perhaps you have coroutines that `await` that outer future or callbacks that react to it, and those are slow. Since it's `set_result()` that caused those coroutines to be resumed or callbacks to be scheduled, asyncio reports that line as the line where the callbacks were "created". But that's misleading because `set_result` just created the wrapper for callbacks that were already present. You might want to look for code that does `await asyncio.gather(...)` and inspect what's going on between that and the next `await` - there might lie your culprit. – user4815162342 Feb 18 '21 at 14:13
  • Thanks for the comments. For others reading this post in the future, what I ended up doing was temporarily upgrading the python runtime to 3.7.6, which has slightly more verbose and detailed asyncio debug logs. From there, I was able to identify that it was in fact a number of `asyncio.gather` calls that were performing slowly, and based on the return types of the objects in those coroutines I was able to track down the problematic functions. – Jonas Bauer Feb 25 '21 at 16:18

0 Answers0