Asyncio lock gets deadlocked when tasks are cancelled

Question

I've recently written a client/server application using python-socketio with aiohttp, I've based my application on async namespaces (server-side), additionally I have many await calls in my on_message events, therefor I must use async locks to make sure I maintain the flow I desire. To achieve this behavior I've written a decorator and wrapped every critical-section type function with it.

@async_synchronized('_async_mutex')
    async def on_connect(self, sid, environ):
        self._logger.info("client with sid: {} connected to namespace: {}!".format(
            sid, __class__.__name__))
        important_member = 1
        await other_class.cool_coroutine()
        important_member = 2

And in my constructor I've initialized _async_mutex = asyncio.Lock()

The decorator:

def async_synchronized(tlockname):
    """A decorator to place an instance based lock around a method """

    def _synched(func):
        @wraps(func)
        async def _synchronizer(self, *args, **kwargs):
            tlock = self.__getattribute__(tlockname)
            try:
                async with tlock:
                    return await func(self, *args, **kwargs)
            finally:
                pass
        return _synchronizer

    return _synched

Now everything works perfectly fine in any normal-use case (closing/opening the client triggers the functions correctly and the locks perform as expected). It's important to note that my on_disconnect function is wrapped with the exact same decorator and lock. The problem I encounter is when a client's network adapter is physically disconnected (normal client closure works just fine), I see that my on_disconnect event is indeed called but another co-routine is currently holding the lock. For some reason the event is triggered multiple times and eventually gets deadlocked.

I've wrapped my decorator with prints that describe the lock's status / calling function and also added a try/catch around every async call. It seems that all of my co-routines catch a cancelled exception (I presume by aiohttp), and therefor a method that "held" the lock was cancelled and the lock is never released. I've tried wrapping every async call with an asyncio.shield() but the behavior didn't change.

Is there a different approach to async locks that I should take here? (removing the locks entirely fixes the problem but may cause undefined behavior in the computational part of the application)

More code samples: The actual on_connect and on_disconnect events:

    @async_synchronized('_async_mutex')
    async def on_connect(self, sid, environ):
        self._logger.info("very good log message")
        self._connected_clients_count += 1

    @async_synchronized('_async_mutex')
    async def on_disconnect(self, sid):
        self._logger.info("very good disconnect message")
        self._connected_clients_count -= 1
        await self._another_namespace_class.inform_client_disconnect(sid) # this method is wrapped with the same decorator but with a different lock

Note: the other does not have the same client connected to it. Also, when a network disconnect occurs I don't see the log messages appear as well (I've set the log level to debug)

Can you construct a minimal example that we can run to demonstrate this issue? — user4815162342, Dec 06 '19 at 10:06
Do you do a lot of work in your connect handler? Not sure if this contributes to the problem, but connect and disconnect handlers are supposed to be quick, they are not a place for complex logic. I suggest you move your logic to background tasks or other events besides connect/disconnect. — Miguel Grinberg, Dec 07 '19 at 11:04
I've added the on_connect and on_disconnect code samples (the code is written on a private network so I can't actually post the real thing, sorry..). The weird issue is that everything works perfectly when the client is closed via regular process termination, but when actually disconnecting the network adapter everything deadlocks and the server requires a manual restart. Is the disconnect detection part done in the socketio lib or is it done on the web app's lib (in my case aiohttp)? it shouldn't be much of a problem for me to change to sanic/tornado, is it something worth trying? — ronen48, Dec 07 '19 at 12:06
*the code is written on a private network so I can't actually post the real thing, sorry* - I didn't mean to suggest that you post the real thing, just that you construct a minimal example using sleeps and `cancel()` that demonstrates the same behavior - if possible. — user4815162342, Dec 07 '19 at 20:32
Your examples do not really show what's so important in your handlers that needs to be protected with locks. If all you are doing is updating a counter and logging then why lock? Also, how many clients do you have active at a time, and will more than one be affected by this network adapter disconnection that you are doing? — Miguel Grinberg, Dec 07 '19 at 23:40

Asyncio lock gets deadlocked when tasks are cancelled

0 Answers0