A python3.5 application occasionally deadlocks when run.
To give more details about the application, the main thread (thread A) is responsible for receiving data from an external device and sending it to an MQTT broker.
There are two other threads that are spawned – one for checking for internet connectivity ((thread B - explained in detail below) and the other thread runs a class which implements a watchdog to check for possible deadlocks.
Thread A is long-running and sends data to the broker as received. In the case of no Internet connectivity, it starts storing data to a local database.
Thread B is used to check for Internet connectivity every 3 minutes, so that when connectivity is restored, the application can start sending the locally stored data back to the MQTT broker. This is to accommodate the scenario where the application loses Internet connectivity and starts losing data received from the device. To avoid this, the application, when offline, will start storing data locally to a SQLite3 database.
This application is run as a systemd service and internet connectivity is through the WiFi dongle, attached to the system.
Recently, we encountered a case where there was no internet connectivity (all pings were getting routed to the dongle’s IP address), and when the application tried to connect to the MQTT broker, it went into deadlock and the stack trace showed that this happened at the getaddrinfo function in socket.py.
Thread B was created to check for a successful internet connection before trying to connect to the MQTT client (to avoid deadlock). This thread also checks on connectivity later on, when the Internet goes down, while the application is already up and running. In this case, occasionally, the application runs into deadlock between the main thread (thread A) and Thread B
Code for Thread B shown below:
while isGoing:
try:
host = socket.gethostbyname("www.google.com")
ip = IP(host)
if ip.iptype() == 'PRIVATE':
disconnected = True
else:
disconnected = False
except Exception as e:
print(e)
disconnected = True
sleep(delay)
When thread B was monitored, it was seen to hand when using the subprocess module, os.system commands, as well as the gethostbyname function.
Note: Paho MQTT on_connect and on_disconnect callbacks are already used to check for connectivity.