Why is azure load balancer still sending traffic to nodes after health probe down?

Question

I have 2 Azure VM sitting behind a Standard Azure Load Balancer.

The load balancer has a healthprobe pinging every 5 seconds with HTTP on /health for each VM.

Interval is set to 5, port is set to 80 and /health, and "unhealthy threshold" is set to 2.

During deployment of an application, we set the /health-endpoint to return 503 and then wait 35 seconds to allow the load balancer to mark the instance as down, and so stop sending new traffic.

However, Load balancer does not seem to fully take the VM out of load. It still sends traffic inbound to the down instance, causing downtime for our customers.

I can see in IIS-logs that the /health-endpoint is indeed returning 503 when it should.

Any ideas whats wrong? Can it be some sort of TCP keep-alive?

score 3 · Accepted Answer · answered Jul 15 '20 at 08:18

I got confirmation from microsoft that this is working "as intended", which makes the Azure Load Balancer a bad fit for web applications. This is the answer from Microsoft:

I was able to discuss your observation with the internal team.

They explained that the Load balancer does not currently have “Connection Draining” feature and would not terminate existing connections.

Connection Draining is available with the Application Gateway Connection Draining.

I heard this is being planning for the Load balancer also as future Road map . You could also add your voice to the request for this feature for the Load balancer by filling the feedback Form.

score 0 · Answer 2 · answered Jun 26 '20 at 10:11

0

I would suggest you the following approach You could have to place a healthcheck.html page on each of your VM's. As long as the probe can retrieve the page, the load balancer will keep sending user requests to the VM.

When you do the deployment, simply rename the healthcheck.html to a different name such as _healthcheck.html. This will cause the probe to start receiving HTTP 404 errors and will take that machine out of the load balanced rotation.

After your deployment have been completed, rename _healthcheck.html back to healthcheck.html. The Azure LB probe will start getting HTTP 200 responses and as a result start sending requests to this VM again.

Thanks, Manu

answered Jun 26 '20 at 10:11

Manu Philip

191
1
6

This is almost how its setup now, but the load balancer is only showing the probe im taking down as 33,3% down if I check in the metrics-page. And its still taking traffic – Jun 26 '20 at 10:47
I have talked to Microsoft, and apparently the azure load balancer keeps already established TCP-connections alive even though the health probe takes the node out of rotation. New connections wont reach it, but all present connections get sent to the unhealthy node. Not very useful for web applications. – Jun 28 '20 at 15:47
you should write that as an answer! Can't believe MS is doing that. Good to know – dariogriffo Jun 29 '20 at 09:23

score 0 · Answer 3 · answered Jun 29 '20 at 12:51

Load Balancer is a pass through service which does not terminate existing TCP connections where the flow is always between the client and the VM's guest OS and application. If a backend endpoint's health probe fails, established TCP connections to this backend endpoint continue, but it will stop sending new flows to the respective unhealthy instance. This is by design to give you opportunity to gracefully shutdown from the application to avoid any unexpected and sudden termination of ongoing application workflow.

Also you may consider configuring TCP reset on idle https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-reset to reduce number of idle connections.

Thank you for your reply. This is confirmed by Microsoft. There is no Connection Draining option on Load balancer. This makes it a bad fit for web applications, as we do not want to tailor make our applications to fit a specific infrastructure(setting http-headers etc). The recommended way forward from MS perspective is to move to Application Gateway, which we did not want to use based on the aggressive pricing. — , Jul 15 '20 at 08:16

Why is azure load balancer still sending traffic to nodes after health probe down?

3 Answers3