3

We have developed a bot using the Bot framework and it is deployed on Azure and running on the Azure Bot service. Currently, there are two channels: Slack and MS Teams. The bot is built using a waterfall model and we have a predefined structure for the steps.

Technology used:

  1. Python (aiohttp, uvloop)
  2. CosmosDB

In the development environment, the python code is running with gunicorn and 2 workers on the P1v2 tier( 1CPU:1Core, 3.50 GB RAM ).

Issue: When we test our bot with only 2 users (Or more) simultaneously, from either Slack or Teams, after some time Azure starts throwing the below error:

Failed to forward request to application. Encountered a System.Net.Http.HttpRequestException exception after 1085.488ms with message: An error occurred while sending the request.. Check application logs to verify the application is properly handling HTTP traffic.

Also please find the attached screenshot of the application logs.

enter image description here

We have looked into its Traffic Manager feature as well and tried to implement it. But the methods available in it, do not seem to be suitable for this case.

The error mentioned above is thrown by Azure and the request does not reach our code. Moreover, after this error, if we send the message again, it reaches our code, the response is also sent successfully from the code, but it never reaches to the user at Slack or Teams.

Our expectation is at least 2 users should be able to talk with the bot at a time, on 1 CPU which is also low.

We have researched on this issue for weeks but we haven't found any working solution.

Any help would be appreciated.

Fawkes
  • 401
  • 1
  • 8
  • 20
  • When you say you're using botframework, are you using [botbuilder-python](https://github.com/microsoft/botbuilder-python)? Or have you deployed another type of bot to an App Service? – AP01 Feb 25 '22 at 22:43
  • Yes @AP01, we are using botbuilder-python. – Fawkes Feb 28 '22 at 04:53
  • Does it work with one user? Waterfall bots should work normally, I don't think there have been any major SDK changes. My instinctive guess would be some sort of synchronization issue relating to CosmosDB. Were you able to perform any tests without it? – AP01 Feb 28 '22 at 17:00
  • 1
    Yes. We were initially using Cache memory. The same issue was there. So we thought it may be resolved using multiple workers. To keep sync among workers, we implemented Cosmos DB. The issue still persists. But the problem is, when this error is captured, the request does not even reach to our code. i.e. `api/messages` endpoint of Azure bot. That means the error is being thrown by the Azure itself. – Fawkes Mar 01 '22 at 04:57
  • Sometimes it works with 2 users. But it never worked consistently for more than 2 users. When we test with 3 or 4 users, simultaneously, after a few messages it starts throwing the mentioned error. And for users, the bot freezes. – Fawkes Mar 01 '22 at 04:59
  • I'm not clear on what "Traffic Manager" is referring to. Have you tried adding Application Insights? App Insights can help identify issues happening on the Azure side as well. Here's a doc on [adding telemetry](https://learn.microsoft.com/en-us/azure/bot-service/bot-builder-telemetry?view=azure-bot-service-4.0&tabs=csharp). – AP01 Mar 03 '22 at 18:32
  • We are also not clear on what "Traffic Manager" is referring to. We are planning to look for other options now. It took weeks and we don't want the bot to break in production, as there is no clear description of the error or how to resolve it. Documents about all these terms are not helpful. – Fawkes Mar 04 '22 at 09:46

0 Answers0