I am encountering a persistent issue in our system where we receive a "Task Cancelled" exception from the SQL Server and Service Readiness Health-Check Endpoint during POD restarts. Please note we have implemented Health-checks using AspNetCore.Diagnostics.HealthChecks
During the process of POD restart, our Health-Check Endpoint, responsible for checking readiness of the SQL Server, throws a "Task Cancelled" exception. This issue is not limited to the SQL Server Health-Check endpoint; it also affects the Service Readiness Health-Check. We have investigated potential causes, including connection termination, startup delays, and timeout configurations. However, we have been unable to pinpoint the exact source of the issue.
To assist us in resolving this problem, I need guidance on the following:
- Steps to further investigate and debug the "Task Cancelled" exception during POD restarts.
- Best practices for handling health-checks during POD restarts to ensure availability of the SQL Server and services, and handling connection disruptions. Also what could be the initial delay for Kubernetes to invoke the probe endpoints?