Problem summary: HTTP probes towards ADFS & WAP is not enough if the ADFS service is still running but the connection between ADFS and SQL database is dead.
ADFS Environment:
Using HTTP probes in Environment: ADFS environment with HTTP Probes
HTTP probes: The normal way of having health checks towards the ADFS environment, is to setup HTTP probes that runs HTTP checks towards each WAP & ADFS server URL or IP. They run health checks over HTTP port 80. Gets a 200 (OK) returned. The response to these probe endpoints is an HTTP 200 OK and is only checking the server/service locally, with no dependence on back-end services(SQL cluster\Database)
Conclusion: Using HTTP probes towards ADFS & WAP servers is not enough
Problem description: The HTTP port is going directly to the WAP and ADFS servers respectively. This means that they only check if the servers & services themselves are OK. There's a known problem where the connection between the ADFS backend and the SQL server dies for 2-3 minutes. During this time, the ADFS backend server times out, if you're unlucky. The problem here is when the ADFS backend server times out, the ADFS service itself is still running.(so as far as the HTTP probe is concerned the ADFS is still upp and running.) The HTTP probe is signalling that the ADFS service is OK. So the load balancer is till sending end users to the ADFS service that has a dead connection towards the SQL database because its service is still running. End-users ends up getting error during authentication.
Question: How can I setup a proper health check between ADFS --> SQL cluster/database? So that you can see that communication between ADFS --> SQL does not work as intended. As in the case when the service on the ADFS servers are still running, but the database connection between ADFS and SQL database is dead. I would want that health check to be used for monitoring as a first stop. Secondary, you could build some recovery steps that could be executed thanks to this health check.