In our IoT Solution that utilizes Azure IoT Hub on server side and Azure IoT Client SDK on Device side, we are seeing intermittent delay in time between device sent a status message over MQTT and message was received by IoT Hub.
In some cases, we are seeing 62 seconds when IoT Hub enqueued the message.
For example, below for Message Id: 5cfbe0ac-d987-3317-bda6-f63ec5fe562a
T1-T0 = 62 seconds
Device Timestamp (T0) = 2017-12-13T12:31:12
IoT hub Timestamp(T1) = 2017-12-13T12:32:14.3600000Z
Question:
What could be the reason for this delay in message reaching to IoT Hub?
How can we dig further (add logs in inner component to figure out where exactly the latency is)?
Any suggestions?
Sample IoT Hub queued Message:
{
"data":{
"attributes":[
{
"code":"STS",
"value":"1", }
],
"type":"status"
},
"messageid":"5cfbe0ac-d987-3317-bda6-f63ec5fe562a",
"protocol":"MQTT",
"**DeviceTimestamp**":"1513168272",
"EventProcessedUtcTime":"2017-12-13T12:32:14.0683721Z",
"PartitionId":5,
"EventEnqueuedUtcTime":"2017-12-13T12:32:14.3010000Z",
"IoTHub":{
"MessageId":null,
"CorrelationId":null,
"**EnqueuedTime**":"2017-12-13T12:32:14.3600000Z",
"StreamId":null
}
}
Update: Device is using Azure Client SDK version 1.1.27. The device code is in C language. The device is not behind firewall or any special network topology used, it connects to the IoT Hub via MQTT. The delay is observed when the status messages is sent over MQTT.
Update 2: Device adds timestamp into message before sending it using Azure client SDK method. The timestamp matches in most cases except when we see spike e.g. above message. It calculates timestamp using NTP time server and send as EPOCH. The IoT hub is on higher tier (10 units).
Based on our analysis today, we are suspecting the Azure client SDK. Our hypothesis is, we believe the SDK store the messages internally (in a list or queue) while trying to establish a connection with IoT Hub. If it does not connect to IoT Hub, it retries after few seconds. When it restore the connection, it delivers the message stored in its queue (again when the message were delayed we saw a bunch of messages coming together with same delay, that kind of prove this hypothesis).