1

A python3.5 application occasionally deadlocks when run.

To give more details about the application, the main thread (thread A) is responsible for receiving data from an external device and sending it to an MQTT broker.

There are two other threads that are spawned – one for checking for internet connectivity ((thread B - explained in detail below) and the other thread runs a class which implements a watchdog to check for possible deadlocks.

Thread A is long-running and sends data to the broker as received. In the case of no Internet connectivity, it starts storing data to a local database.

Thread B is used to check for Internet connectivity every 3 minutes, so that when connectivity is restored, the application can start sending the locally stored data back to the MQTT broker. This is to accommodate the scenario where the application loses Internet connectivity and starts losing data received from the device. To avoid this, the application, when offline, will start storing data locally to a SQLite3 database.

This application is run as a systemd service and internet connectivity is through the WiFi dongle, attached to the system.

Recently, we encountered a case where there was no internet connectivity (all pings were getting routed to the dongle’s IP address), and when the application tried to connect to the MQTT broker, it went into deadlock and the stack trace showed that this happened at the getaddrinfo function in socket.py.

Thread B was created to check for a successful internet connection before trying to connect to the MQTT client (to avoid deadlock). This thread also checks on connectivity later on, when the Internet goes down, while the application is already up and running. In this case, occasionally, the application runs into deadlock between the main thread (thread A) and Thread B

Code for Thread B shown below:

while isGoing:
          try:
            host = socket.gethostbyname("www.google.com")
            ip = IP(host)
            if ip.iptype() == 'PRIVATE':
                disconnected = True
            else:

                disconnected = False
          except Exception as e: 
              print(e)
              disconnected = True
          sleep(delay)

When thread B was monitored, it was seen to hand when using the subprocess module, os.system commands, as well as the gethostbyname function.

Note: Paho MQTT on_connect and on_disconnect callbacks are already used to check for connectivity.

nbn
  • 11
  • 2
  • When you say deadlock, how long have you left it? (just wondering if it's hitting the full 15min TCP timeout). Also if you are using systemd is there not networkd event for when the network goes up/down you could listen for? – hardillb Jul 23 '19 at 13:49
  • The issue happens mainly when there's network connection but no internet reachability so networkd doesn't help because it's only when interface goes up or down. I tried networkd-dispatcher as well but still it can't differentiate between network connection and internet reachability. – nbn Jul 24 '19 at 08:27

0 Answers0