2

Describe the bug

DeviceClient SetMethodDefaultHandlerAsync handler is not triggered on internet disconnection instantly. It triggers after 15 to 20 minutes. Below are the logs

IoT Hub connection status Changed Status: Connected Reason: Connection_Ok Time: 3:09:15 PM +02 IoT Hub connection status Changed Status: Disconnected_Retrying Reason: Communication_Error Time: 3:26:29 PM +02

I disconnected the internet at 3:10:00 PM +02 and communication error was thrown after 16 minutes. I have created a sample code which reproduces the issue

Steps to reproduce

using Microsoft.Azure.Amqp.Transport;
using Microsoft.Azure.Devices.Client;
using Microsoft.Azure.Devices.Client.Transport.Mqtt;
using Microsoft.Azure.Devices.Shared;
using System;
using System.Threading;
using System.Threading.Tasks;

namespace IOTClientTest
{
    class Program
    {
        private DeviceClient _client;
        static async Task Main(string[] args)
        {
            
            await new Program().RunAsync(collection =>
            {
                return null;
            }, (new CancellationTokenSource()).Token);
            Console.Read();
        }
        
        public async Task RunAsync(Func<TwinCollection, Task<TwinCollection>> twinUpdateHandler, CancellationToken cancellationToken)
        {
            var settings = new ITransportSettings[]
            {
                new MqttTransportSettings(TransportType.Mqtt_Tcp_Only)
                {
                    KeepAliveInSeconds = 10,
                }
            };

            _client = DeviceClient.CreateFromConnectionString(
                "HostName=XXXX;SharedAccessKey=XXXXX",
                "XXXX", settings);
            
            
            var retryPolicy = new ExponentialBackoff(5,
                TimeSpan.FromSeconds(15), TimeSpan.FromSeconds(120),
                TimeSpan.FromSeconds(5));
            _client.SetRetryPolicy(retryPolicy);

            
            _client.SetConnectionStatusChangesHandler((status, reason) =>
            {
                Console.WriteLine($"IoT Hub connection status Changed Status: {status} Reason: {reason} Time: {DateTime.Now.ToString("h:mm:ss tt zz")}");
            });
            
             await _client.OpenAsync();
            

            await _client.SetMethodHandlerAsync("executeShell", async (req, context) =>
            {
                await Task.Delay(0);
                return new MethodResponse(500);
            },null);

            await _client.SetMethodDefaultHandlerAsync(MethodHandler, null);
            
            await _client.SetDesiredPropertyUpdateCallbackAsync(async (collection, context) =>
                {
                    var updated = await twinUpdateHandler(collection);
                    await _client.UpdateReportedPropertiesAsync(updated);
                }
                , null);

            await ReceiveMessagesAsync(cancellationToken);
        }

        private async Task<MethodResponse> MethodHandler(MethodRequest methodRequest, object parameter)
        {
            await Task.Delay(0);
            return new MethodResponse(500);
        }

        private async Task ReceiveMessagesAsync(CancellationToken cancellationToken)
        {
            while (!cancellationToken.IsCancellationRequested)
            {
                var message = await _client.ReceiveAsync(TimeSpan.FromSeconds(10));
                if (message != null)
                {
                    //Do something with received message...
                }
            }
        }
        
    }
}

Expected behavior

On internet disconnection, device client should change its status in not more than 10 seconds

Actual behavior

Device client throws status change after 16 minutes

Versions used

Add following information:

  • dotnet --info .NET Core SDK (reflecting any global.json): Version: 3.1.201 Commit: b1768b4ae7

Runtime Environment: OS Name: zorin OS Version: 15 OS Platform: Linux RID: linux-x64 Base Path: /usr/share/dotnet/sdk/3.1.201/

Host (useful for support): Version: 3.1.3 Commit: 4a9f85e9f8

.NET Core SDKs installed: 2.2.402 [/usr/share/dotnet/sdk] 3.1.201 [/usr/share/dotnet/sdk]

.NET Core runtimes installed: Microsoft.AspNetCore.All 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 2.2.8 [/usr/share/dotnet/shared/Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.3 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs: https://aka.ms/dotnet-download

Repository is available here: https://github.com/raza707mirza/iottest

I have reported a bug here too: https://github.com/Azure/azure-iot-sdk-csharp/issues/1409

  • are you facing this issue when using the latest Nuget [Microsoft.Azure.Devices.Client 1.26.0](https://www.nuget.org/packages/Microsoft.Azure.Devices.Client/1.26.0) ? – asergaz Jun 24 '20 at 00:00
  • I saw in the repo you shared that the version being used is 1.20.0. Please update to 1.26.0 and try again. – asergaz Jun 24 '20 at 00:04
  • I tried with all versions from 1.20.0 to 1.26.0 and there is issue in all of them. – Raza Ashfaq Mirza Jun 24 '20 at 06:07
  • Thanks for confirming. A retry only happens in some situations. See [ErrorDelegatingHandler](https://github.com/Azure/azure-iot-sdk-csharp/blob/master/iothub/device/devdoc/architecture.md#errordelegatinghandler) . Can you confirm that the exception you receive is of type "IotHubCommunicationException"? – asergaz Jun 24 '20 at 14:14
  • The last exception in ShouldRetry is IotHubCommunicationException Below is the trimmed serialized exception { "ClassName": "Microsoft.Azure.Devices.Client.Exceptions.IotHubCommunicationException", "Message": "Transient network error occurred, please retry.", "Data": { }, "InnerException": { "ClassName": "System.OperationCanceledException", "Message": "The operation was canceled.", "Data": { }, "InnerException": null, "HelpURL": null, "StackTraceString": " at ... – Raza Ashfaq Mirza Jun 24 '20 at 15:23
  • one other thing to try is to reduce the min and max BackOff. Take the [Default RetryPolicy](https://github.com/Azure/azure-iot-sdk-csharp/blob/master/iothub/device/devdoc/retrypolicy.md) as an example and check if it makes any difference. – asergaz Jun 24 '20 at 15:40
  • I have even tried with my custom retry policy even that doesnt makes any difference. Sdk triggers retry once it gets communication exception and that is thrown after 16 to 20 minutes – Raza Ashfaq Mirza Jun 24 '20 at 16:38
  • 1
    @RazaAshfaqMirza Having the same problem, but with SetConnectionStatusChangesHandler method. Also updated your bug report in git hub. – aDisplayName Jul 06 '20 at 03:42

2 Answers2

2

As per the outcome of discussion in github: https://github.com/Azure/azure-iot-sdk-csharp/issues/1409

The SDK is relying on the OS TCP stack to inform that the disconnect has happened, and the OS can take a couple of retries before relaying this information. This might be what is causing the connection status change handler to get invoked with a 15min delay on Linux.

For Mqtt, the client does send a ping request every 75 seconds, but does not seem to be monitoring the ping response being received from the broker.

PS: This is not occurring when using Windows 10 OS

Update

Added a fix for the mqtt layer, where the sdk now monitors for a ping response and disconnects if the delay between sending a ping request and receiving a response is >30secs (this value is currently not configurable).

For the amqp implementation, the amqp library encapsulates this ping request-response logic from us; all that the device sdk does is set the IdleTimeout.

Zoe
  • 27,060
  • 21
  • 118
  • 148
asergaz
  • 996
  • 5
  • 17