2

I'm looking to do something that seems like a common use case, however I have not been able to find any information anywhere that leads to a solution for what I want.

I'm using ECS in AWS for various services. When a new build occurs in my CI a new Docker image is pushed up to ECR and a lambda triggered to cause ECS to deploy the new image. This all works great. What I am struggling with is that I want to be notified somehow when the new code becomes 'live', this basically equates to when the newly registered target becomes healthy.

Does anybody have any suggestions as to how I can trigger a notification when a new target on an ELB becomes healthy?

user1686402
  • 83
  • 1
  • 8

1 Answers1

1

I'm fairly confident you should be able to use the HealthyHostCount metric for your alarm, if you adjust it correctly. This assumes you're actually deploying a new EC2 host before draining off the old one. You can verify this by looking at your metrics and confirming your HealthyHostCount exceeds your DesiredHostCount for any period of time. If this isn't the case, update your question with the details, as that answer (detecting new service deployments vs new hosts) is a bit different.

Once you've verified your HealthyHostCount does in fact exceed your DesiredHostCount, you can setup a CloudWatch Alarm to just detect any maximum > [DesiredCount] for 1 out of 1 data points for any 1 minute period. The period you set here should be small enough so as to not overlap with successive deployments (since the max won't change within a given period if you've deployed multiple times).


example

MrDuk
  • 16,578
  • 18
  • 74
  • 133
  • Thanks MrDuk. This is what I have tried already and although the alarm does sometimes fire, it doesn't fire reliably every time, presumably because the old host is drained before the 1 minute elapses. I was looking at high definition alarms to resolve this but I have no clue how to define this healthy host count in a custom metric. Any ideas on that one? – user1686402 Mar 26 '18 at 22:15
  • 1
    Actually it worked because it works on average and not actual healthy hosts. All good! – user1686402 Mar 27 '18 at 05:37
  • Glad it's working - the `max` should have worked I'd suspect, because it should be measuring _has the max went above 1 **within** the last minute_, rather than _has the max went above 1 **for** the last minute_. – MrDuk Mar 27 '18 at 13:37
  • Unfortunately its still not working reliably. I think its because it only polls every minute, and there are only two hosts healthy for such a short amount of time that it often misses. We are getting intermittent results. This is quite frustrating! Any other ideas on how we can achieve what we want? – user1686402 Mar 28 '18 at 00:44
  • Are you sure you're selecting `1 out of 1` datapoints for the alarm? If you've set `1 out of 1` for `1 minute` intervals, with a threshold of `max >= 1`, CloudWatch should be checking what `max` has been set to for every `1 minute`, and if it's more than `1`, trigger an alarm. If that's not being triggered, it sounds like the metric isn't getting updated (are you sure your hosts are actually up at the same time for a little while? How long do you see two `InService` hosts in your ASG? How long do you see two `Healthy` hosts on your ELB?) – MrDuk Mar 28 '18 at 16:24
  • Sorry I can't edit my comment, but I meant to say `max > 1` -- I'll add a screenshot example. – MrDuk Mar 28 '18 at 16:31
  • The flow as I understand it with ECS is that when a new image is pushed to ECR the deployment begins, a new target is spawned and then health checks are run. Once it is verified as healthy the other target is drained. When I've observed alarms that have worked, there have only been 2 targets healthy for 1 second. I guess it must be a timing issue that sometime the old target starts draining before the new one is registered as healthy. Unfortunately I cant think of any other way of doing it :( – user1686402 Mar 28 '18 at 23:12