2

Problem description

We have deployed our chat bot to the Azure cloud as an App Service and within Application Insights we see several exceptions of type TaskCanceledException popping up randomly.

After taking a look at the logs we have identified three lines of code, which are throwing the exceptions. I have tagged these as TaskCanceledException source #. Here is simplified structure of our messaging controller for the reference:

public async Task<IHttpActionResult> Post([FromBody]BotActivity activity)
{
    try
    {
        var result = await PostInternal(activity).ConfigureAwait(false);

        return result;
    }
    catch (Exception e)
    {
        Logger.Error(e);
        throw;
    }
}

private async Task<IHttpActionResult> PostInternal(BotActivity activity)
{
    // TaskCanceledException source #1.
    await SendTypingIndicator(activity).ConfigureAwait(false);

    var type = activity.GetActivityType();
    if (type == ActivityTypes.Message)
    {
        // TaskCanceledException source #2.
        var flag = await LoadFlagFromBotState(activity).ConfigureAwait(false);

        // Some logic slightly altering flow according to value of 'flag'.

        // TaskCanceledException source #3.
        await Conversation.SendAsync(activity, () => new RootDialog()).ConfigureAwait(false);
    }

    return Ok();
}

private async Task SendTypingIndicator(BotActivity activity)
{
    var reply = activity.CreateReply();
    reply.Type = ActivityTypes.Typing;
    reply.Text = null;
    ConnectorClient connector = new ConnectorClient(new Uri(activity.ServiceUrl));

    await connector.Conversations.ReplyToActivityAsync(reply).ConfigureAwait(false);
}

private async Task<bool> LoadFlagFromBotState(BotActivity activity)
{
    using (var scope = DialogModule.BeginLifetimeScope(Conversation.Container, activity))
    {
        var botDataStore = scope.Resolve<IBotDataStore<BotData>>();
        var key = Address.FromActivity(activity);
        var userData = await botDataStore.LoadAsync(key, BotStoreType.BotUserData, CancellationToken.None).ConfigureAwait(false);

        var flag = userData.GetProperty<bool>("TheFlag");
        if (!flag)
        {
            userData.SetProperty("TheFlag", true);
            await botDataStore.SaveAsync(key, BotStoreType.BotUserData, userData, CancellationToken.None).ConfigureAwait(false);
            await botDataStore.FlushAsync(key, CancellationToken.None).ConfigureAwait(false);
        }

        return flag;
    }
}

And here are the URLs of external dependencies which timed-out according to Application Insights:

  • For TaskCanceledException source #1:
    • https://directline.botframework.com/v3/conversations/<CONV_ID>/activities/<ACT_ID>
  • For TaskCanceledException source #2:
    • https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:user<USER_ID>
  • For TaskCanceledException source #3:
    • https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:private<CONV_ID>:<USER_ID>
    • https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:conversation<CONV_ID>
    • https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:user<USER_ID>

Additional information

  • Our bot is hosted as an App Service within Microsoft Azure cloud.
  • We plugged in custom BotState manager according to following documentation so that all the BotState is stored within Cosmos DB.
  • Our Cosmos DB instance is within same Azure Resource Group as the bot App Service.
  • We verified the availability and throughput of our Cosmos DB instance and everything seems to be perfectly in order.
  • We are using DirectLine as the communication channel.
  • The TaskCanceledException happens randomly, there does not seem to be any specific steps which needs to be performed to reproduce this.
  • We tried to adapt the async-await pattern the 'proper way', for example by making sure that we are not mixing any sync and async code and by using ConfigureAwait(false) everywhere where the await keyword is.

Conclusion

We have been trying to figure out the root cause for some time, however we have no clue what could be possibly the problem here. We ruled out the networking issues due to the fact that we utilize Azure's PaaS here.

I will appreciate any advice.

Markkknk
  • 406
  • 6
  • 13
  • Have you tried increasing the timeout? – Paulo Morgado Nov 23 '18 at 07:20
  • No, we have left the default timeout intervals in place, which are about 1 minute. Most of the time the duration of these external calls are just couple milliseconds according to App Insights. – Markkknk Nov 23 '18 at 09:10
  • That would be a way of seeing if it takes more than one minute or never replies. – Paulo Morgado Nov 23 '18 at 14:49
  • The ConfigureAwait(false) is causing the HttpContext to be lost, and it is probably sometimes disposed when it is expected to be available. – Eric Dahlvang Nov 26 '18 at 17:12
  • More information about how .ConfigureAwait(false) affects httpcontext can be found in this answer: https://stackoverflow.com/a/28433417/86646 – Eric Dahlvang Nov 26 '18 at 19:52

0 Answers0