We have three EC2 instances—one in each availability zone (AZ) in the eu-west-1 region. They are loadbalanced using ELB. We'd like to monitor how many instances are registered at the loadbalancer, using CloudWatch. The problem ist: I don't really understand the HealthyHostCount
metric.
For a deployment, we'd like to be able to de-register a single instance (take it out of the LB) without being notified. So the alarm would be: Notify if there is only 1 healthy instance left behind the loadbalancer for 5 minutes.
As far as I understand, HealthyHostCount
(HHC) is the number of healthy instances that are registered with a given ELB, averaged over all AZs. If everything is okay, the HHC should be 1 (no matter over what period of time) because there is 1 instance in each AZ.
A couple of days ago, someone deployed without re-registering the instances, so there was only 1 instance being balanced. When we noticed that, we created an alarm that was to notify us when the average HHC sunk below 0.6 after 5 minutes. (If only 1 instance is registered in ELB, the HHC should average 0.33 for any period of time.) However, the alarm never changed to state "ALARM."
When I checked the HHC in CloudWatch, the HHC were numbers that didn't make sense (sum of 10.0 for a 5-minute interval is all I remember now).
It's all a big mess to me. Any time I think I understand the metric, the CloudWatch charts are all gibberish to me.
Could someone please explain how to use HHC to get an alarm when only 1 instance is registered? Is average HHC the way to go or should I use another metric?