How do I set up CloudWatch to detect when an EC2 instance goes down?

Question

I've got an app running on AWS. How do I set up Amazon CloudWatch to notify me when the EC2 instance fails or is no longer responsive?

I went through the CloudWatch screens, and it appears that you can monitor certain statistics, like CPU or disk utilization, but I didn't see a way to monitor an event like "the instance got an http request and took more than X seconds to respond."

score 15 · Answer 1 · edited Sep 28 '15 at 20:42

15

Amazon's Route 53 Health Check is the right tool for the job.

Route 53 can monitor the health and performance of your application as well as your web servers and other resources.

You can set up HTTP resource checks in Route 53 that will trigger an e-mail notification if the server is down or responding with an error.

http://eladnava.com/monitoring-http-health-email-alerts-aws/

edited Sep 28 '15 at 20:42

Elad Nava

7,746
2
41
61

answered Apr 17 '14 at 19:13

Steven Cogorno

373
3
9

Thanks Steven, I didn't realise you could do that (even for domains not on Route 53). 50c per health check per month which is much cheaper than pingdom and running your own EC2 instance if it's just one or two. – spidie Jun 04 '15 at 09:35

score 13 · Answer 2 · answered Jan 02 '14 at 02:29

To monitor an event in CloudWatch you create an Alarm, which monitors a metric against a given threshold.

When creating an alarm you can add an "action" for sending a notification. AWS handles notifications through SNS (Simple Notification Service). You can subscribe to a notification topic and then you'll receive an email for you alarm.

For EC2 metrics like CPU or disk utilization this is the guide from the AWS docs: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_AlarmAtThresholdEC2.html

As answered already, use an ELB to monitor HTTP.

This is the list of available metrics for ELB: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html#available_metrics

To answer your specific question, for monitoring X seconds for the http response, you would set up an alarm to monitor the ELB "Latency".

score 8 · Answer 3 · answered Feb 18 '12 at 17:03

CloudWatch monitoring is just like you have discovered. You will be able to infer that one of your instances is frozen by taking a look at the metrics, but CloudWatch won't e.g. send you an email when your app is down or too slow, for example.

If you are looking for some sort of notification when your app or instance is down, I suggest you to use a monitoring service. Pingdom is a good option. You can also set up a new instance on AWS and install a monitoring tool, like Nagios, which would be my preferred option.

Good practices that are always worth, in the long road: using load balancing (Amazon ELB), more than one instance running your app, Autoscaling (when an instance is down, Amazon will automatically start a new one and maintain your SLA), and custom monitoring.

My team has used a custom monitoring script for a long time, and we always knew of failures as soon as they occurred. Basically, if we had two nodes running our app, node 1 sent HTTP requests to node 2 and node 2 to 1. If any request took more than expected, or returned an unexpected HTTP status or response body, the script sent an email to the system admins. Nowadays, we rely on more robust approaches, like Nagios, which can even monitor operating system stuff (threads, etc), application servers (connection pools health, etc) and so on. It's worth every cent invested in setting it up.

score 5 · Answer 4 · answered Mar 10 '13 at 07:53

CloudWatch recently added "status check" metrics that will answer one of your questions on whether an instance is down or not. It will not do a request to your Web server but rather a system check. As previous answer suggest, use ELB for HTTP health checks.

score 2 · Answer 5 · answered Jun 19 '12 at 07:15

You could always have another instance for tools/testing, that instance would try the http request based on a schedule and measure the response time, then you could publish that response time with CloudWatch and set an alarm when it goes over a certain threshold.

You could even do that from the instance itself.

Tarek Koudsi · Answer 6 · 2014-01-18T20:32:23.640

As Kurst Ursan mentioned above, using "Status Check" metrics is the way to go. In some cases you won't be able to browse that metrics (i.e if you;re using AWS OpsWorks), so you're going to have to report that custom metric on your own. However, you can set up an alarm built on a metric that always matches (in an OK sate) and have the alarm trigger when the state changes to "INSUFFICIENT DATA" state, this technically means CloudWatch can't tell whether the state is OK or ALARM because it can't reach your instance, AKA your instance is offline.

score 0 · Answer 7 · answered Jul 09 '19 at 23:20

There are a bunch of ways to get instance health info. Here are a couple.

Watch for instance status checks and EC2 events (planned downtime) in the EC2 API. You can poll those and send to Cloudwatch to create an alarm.
Create a simple daemon on the server which writes to DynamoDB every second (has better granularity than Cloudwatch). Have a second process query the heartbeats and alert when missing.
Put all instances in a load balancer with a dummy port open that that gives a TCP response. Setup TCP health checks on the ELB, and alert on unhealthy instances.

Unless you use a product like Blue Matador (automatically notifies you of production issues), it's actually quite heinous to set something like this up - let alone maintain it. That said, if you're going down the road, and want some help getting started using Cloudwatch (terminology, alerts, logs, etc), start with this blog: How to Monitor Amazon EC2 with CloudWatch

score 0 · Answer 8 · answered Feb 13 '20 at 14:01

You can use CloudWatch Event Rule to Monitor whenever any EC2 instance goes down. You can create an Event rule from CloudWatch console as following :

In the CLoudWatch Console choose Events -> rule

For Event Pattern, In service Name Choose EC2 For Event Type, Choose EC2 Instance State-change Notification For Specific States, Choose Stopped

In targets Choose any previously created SNS topic for sending a notification!

Source : Create a Rule - https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-Input-Transformer-Tutorial.html#input-transformer-create-rule

This is not exactly a CloudWatch alarm, however this serves the purpose of monitoring/notification.

How do I set up CloudWatch to detect when an EC2 instance goes down?

8 Answers8