Problem description
We have deployed our chat bot to the Azure cloud as an App Service and within Application Insights we see several exceptions of type TaskCanceledException
popping up randomly.
After taking a look at the logs we have identified three lines of code, which are throwing the exceptions. I have tagged these as TaskCanceledException source #
. Here is simplified structure of our messaging controller for the reference:
public async Task<IHttpActionResult> Post([FromBody]BotActivity activity)
{
try
{
var result = await PostInternal(activity).ConfigureAwait(false);
return result;
}
catch (Exception e)
{
Logger.Error(e);
throw;
}
}
private async Task<IHttpActionResult> PostInternal(BotActivity activity)
{
// TaskCanceledException source #1.
await SendTypingIndicator(activity).ConfigureAwait(false);
var type = activity.GetActivityType();
if (type == ActivityTypes.Message)
{
// TaskCanceledException source #2.
var flag = await LoadFlagFromBotState(activity).ConfigureAwait(false);
// Some logic slightly altering flow according to value of 'flag'.
// TaskCanceledException source #3.
await Conversation.SendAsync(activity, () => new RootDialog()).ConfigureAwait(false);
}
return Ok();
}
private async Task SendTypingIndicator(BotActivity activity)
{
var reply = activity.CreateReply();
reply.Type = ActivityTypes.Typing;
reply.Text = null;
ConnectorClient connector = new ConnectorClient(new Uri(activity.ServiceUrl));
await connector.Conversations.ReplyToActivityAsync(reply).ConfigureAwait(false);
}
private async Task<bool> LoadFlagFromBotState(BotActivity activity)
{
using (var scope = DialogModule.BeginLifetimeScope(Conversation.Container, activity))
{
var botDataStore = scope.Resolve<IBotDataStore<BotData>>();
var key = Address.FromActivity(activity);
var userData = await botDataStore.LoadAsync(key, BotStoreType.BotUserData, CancellationToken.None).ConfigureAwait(false);
var flag = userData.GetProperty<bool>("TheFlag");
if (!flag)
{
userData.SetProperty("TheFlag", true);
await botDataStore.SaveAsync(key, BotStoreType.BotUserData, userData, CancellationToken.None).ConfigureAwait(false);
await botDataStore.FlushAsync(key, CancellationToken.None).ConfigureAwait(false);
}
return flag;
}
}
And here are the URLs of external dependencies which timed-out according to Application Insights:
- For
TaskCanceledException source #1
:https://directline.botframework.com/v3/conversations/<CONV_ID>/activities/<ACT_ID>
- For
TaskCanceledException source #2
:https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:user<USER_ID>
- For
TaskCanceledException source #3
:https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:private<CONV_ID>:<USER_ID>
https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:conversation<CONV_ID>
https://docdb-instance-url.documents.azure.com/dbs/<DB_NAME>/colls/BotState/docs/directline:user<USER_ID>
Additional information
- Our bot is hosted as an App Service within Microsoft Azure cloud.
- We plugged in custom BotState manager according to following documentation so that all the BotState is stored within Cosmos DB.
- Our Cosmos DB instance is within same Azure Resource Group as the bot App Service.
- We verified the availability and throughput of our Cosmos DB instance and everything seems to be perfectly in order.
- We are using DirectLine as the communication channel.
- The
TaskCanceledException
happens randomly, there does not seem to be any specific steps which needs to be performed to reproduce this. - We tried to adapt the
async-await
pattern the 'proper way', for example by making sure that we are not mixing any sync and async code and by using ConfigureAwait(false) everywhere where theawait
keyword is.
Conclusion
We have been trying to figure out the root cause for some time, however we have no clue what could be possibly the problem here. We ruled out the networking issues due to the fact that we utilize Azure's PaaS here.
I will appreciate any advice.