1

We have a private EC2 Linux instance running behind an ALB. There is only one instance running and no auto-scaling configured.

Sometimes ALB marks the instance as unhealthy for some reasons. This mostly happens when network traffic is high on the instance, which generally one or two hours. This behavior is unpredictable. So when try to access the web application which is deployed in the EC2 instance, we get 502 bad gateway. We reboot the EC2 instance and only then the issue is resolved.

Does an ALB perform a health check on a target group again after it marks it as unhealthy? Suppose an ALB marks the target group with one EC2 instance as unhealthy. ALB is configured to perform a health check every 30 seconds. Will it check for healthiness after 30 seconds after it marked as unhealthy on the same target group? Or will it look for new healthy instance?

I assume auto-scaling configuration may resolve this problem by setting AS group with 1 when an instance go unhealthy? Our AWS architect feels the Tomcat is creating memory leak when too many requests come at a time.Tomcat does not run in the EC2.

What is the way to troubleshoot this problem? I search for system logs and configured ALB access logs, but no clue is available.

In this link I see ALB routes requests to the unhealthy targets when no other healths target is available . https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html

My question is will ALB perform health check on the target group again after it marks it as unhealthy?

Manushi
  • 597
  • 1
  • 7
  • 26

1 Answers1

1

Indeed even when marked as unhealthy, the ALB continues the health checking. You can configure a 'healthy threshold count', which indicates how many 'healthy' responses should be received before an unhealthy host is marked as healthy again.

According to the docs:

When the health checks exceed HealthyThresholdCount consecutive successes, the load balancer puts the target back in service.

If your health check interval is 60 seconds, and the healthy threshold count is 3, it takes a minimum of 3 minutes before an unhealthy host will be marked healthy again.

LRutten
  • 1,634
  • 7
  • 17
  • Thanks for this information. The ALB marks the EC2 instance as healthy only after EC2 is rebooted. We could not possibly find out why ALB marks the EC2 as unhealthy. The timeout is 5 seconds and consecutive 2 failures in heath check marks it unhealthy. How to find out why ALB is marking an instance as unhealthy? Which log can help to find this? – Manushi Aug 19 '22 at 09:05
  • As you can see [here](https://docs.aws.amazon.com/cli/latest/reference/elbv2/describe-target-health.html), the `describe-target-health` API can provide additional information on what the reason or status code is received by the health check when it failed. You have to provide the target group ARN as input. – LRutten Aug 19 '22 at 09:33