I have a bunch of load-balanced Azure VMs running Windows Server 2019 that are running IIS with ASP.NET MVC/Web API 2 applications running on them. They communicate with a Database in a SQL Server Managed Instance.
The VMs and the Managed Instance are in separate subnets with the same Virtual Network, and most of the time, it all works perfectly.
However, a few times per day, seemingly randomly, all of the AS.NET apps will start logging timeout errors while trying to connect to the database or to perform an operation on an already-open connection. Every single one of them.
A few minutes later, everything just starts working normally, which is cool, but the outage time is extremely bothersome to my customers, and we just experienced one that brought us down for a good 45 minutes.
I have no idea what is going on our even how to approach troubleshooting it.