We have a business requirement to maintain Iot Edge devices Connected state in Digital Twins Instance. It should be near to real time, but short delays up to few minutes are acceptable. I.e., In Digital Twins instance we have DT entity for each IoT Edge device, and it have property Online (true/false). In production we will have up to few hundreds of devices in total.
We are looking for a good method of monitoring Edge devices connected state.
Our initial attempt was to subscribe an AZ Function for Event Grid Device Connected/Disconnected notifications in IoT Hub events. After initial testing we found that Event Grid seems cannot be used as a single source. After more research we found following information:
IoT Hub does not report each individual device connect and disconnect, but rather publishes the current connection state taken at a periodic 60 second snapshot. Receiving either the same connection state event with different sequence numbers or different connection state events both mean that there was a change in the device connection state during the 60 second window.
And another one:
Azure IoT device SDKs disconnect from IoT Hub and then reconnect when they renew SAS tokens over the MQTT (and MQTT over WebSockets) protocol…. … If you're monitoring device connections with Event Hub, make sure you build in a way of filtering out the periodic disconnects due to SAS token renewal. For example, do not trigger actions based on disconnects as long as the disconnect event is followed by a connect event within a certain time span.
Next, after more search on the topic, we found the following question:
Best way to Fetch connectionState from 1000's of devices - Azure IoTHub Accepted answer suggests using heartbeat pattern, however in official documentation it is clearly stated that it should not be used in production environment: https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-identity-registry#device-heartbeat
And in the article describing heartbeat pattern there is a mention of “short expiry time pattern” but not much information given to detail it. For complete picture, we also found the following article: https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-how-to-order-connection-state-events But it is based on Event Grid subscription and therefore will not provide accurate data.
Finally, after reading all of this, we have the following plan to address the problem:
- We will have AZ Function subscribed for Event Grid Device Connected/Disconnected notifications.
- If DeviceConnected event received, the function will check device connectivity immediately.
- If DeviceDisconnected event received, the function will delay for 90 seconds, as we found DeviceConnected event usually come after ~60 seconds for a given device. And after the delay it will check the device connectivity.
Device Connectivity will be checked with Cloud to Device message send with acknowledgment as described here: https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-csharp-csharp-c2d#receive-delivery-feedback
Concerns of the solution:
- Complexity.
- AZ function would need IoT HUB service Connection string.
- Device disconnected event might be delayed up to few minutes.
Can anyone suggest better solution? Thanks!
EDIT:
In our case, we do not use DeviceClient, but ModuleClient on the Edge devices, and modules does not support C2D messages, which is stated here:
https://learn.microsoft.com/en-us/azure/iot-edge/module-development?view=iotedge-2018-06&WT.mc_id=IoT-MVP-5004034#iot-hub-primitives
So we would need to use Direct Methods instead to test if the device is Online.