13

I have an instance running on Amazon EC2 that I turned into a webserver.

Now I have been looking at cloudwatch, but I do not know if it is the correct tool for the job. Basically I want to get informed when the server is down, for whatever reason.

Maybe the server got hacked, or the server shut down for whatever reason, I want to get a notification on that.

I have enabled clouwatch, and tried to set up a alert, but I only see things like network in-out or cpu usage, an d metrix. Now I do not know if these will do the trick.

Saif Bechan
  • 10,960
  • 10
  • 42
  • 63
  • 1
    checkout cloudkick. i know we arent supposed to spread opinion, but cloudkick is my favorite monitoring solution. pingdom is pretty useless as your site can be pingable (apache/nginx) but your proxied too php/java/ruby proccesses may be not functioning. hitting a actual URL with cloudkick every 3 minutes and checking for presence of text you specify, or for 2xx success, is much more useful and ensures the whole stack is working. i have monitors setup for loadaverage, mem usage, agent connection (server offline or connection dropped), http 2xx success on pages, etc. – iainlbc Dec 16 '11 at 03:05
  • You made a good point here. If only php is down you will still get a 200 response. The link you provided is a good alternative. I will have a look at it when I make my final decision. – Saif Bechan Dec 16 '11 at 06:22
  • @iainlbc You can set Pingdom to hit an actual URL and check for specific text. – ceejayoz Nov 20 '12 at 02:00

6 Answers6

11

One recommendation is to monitor a metric that should always have a numeric value - such as CPU usage, and trigger an alarm when the metric state is 'insufficient data' you can use Amazon's SNS to notify you of this.

Alternatively, you can setup custom metrics which return a binary state for specific services (httpd, mysql, etc) and generate an alert any time any of these reads 0. This approach offers the possibility of much finer detail - combine it with 'insufficient data' to cover all cases.

You may be more successful using something that actually monitors your site (e.g. Pingdom, UptimeRobot, etc).

cyberx86
  • 20,805
  • 1
  • 62
  • 81
  • I have looked at the services like pingdom, and I will certainly enable them. Further, I was also thinking of looking at the cpu, but I thought what would happen when just the webserver is hacked. Having a test for httpd, mysql etc would certainly help. *Can you maybe give me a hint on how to enable this binary state metric*. – Saif Bechan Dec 16 '11 at 02:55
  • Briefly: determine a way to get the status of the service of interest (e.g. [ps|grep](http://blog.eracc.com/2010/05/08/linux-monitor-a-service-with-a-watchdog-script/) for the pid/name; [check for a port in use](http://bash.cyberciti.biz/monitoring/monitor-unix-linux-network-services/), etc) - i.e. a watchdog script. Modify such a script to call the Cloudwatch API (PutMetricData) passing it either a 0 (for down) or a 1 (for up) - it would be best to use one of the SDKs that exist for this purpose (e.g. Ruby, PHP, etc); the command line version mon-put-data is slower. Run it all with cron. – cyberx86 Dec 16 '11 at 03:23
  • 1
    Ok that sounds a little complicated. I think a service like uptimerobot would be a better choice for me personal. Thanks for all the help. – Saif Bechan Dec 16 '11 at 06:21
3

You can use OpsGenie (http://www.opsgenie.com) to send rich alert for CloudWatch. Currenly CloudWatch has a limited set of alerting mechanism including Email and SMS via its SNS mechanism.

You can configure CloudWatch to call OpsGenie web services API, get the right people notified rapidly via push notifications to iPhone/Android apps, SMS, voice calls, etc. according to the preferences of the recipients.

Please take a look at following blog post for detailed information:

http://www.opsgenie.com/blog/2012/09/04/aws-cloudwatch-alarms-on-your-mobile-with-opsgenie.html

enguzekli
  • 31
  • 3
3

You can implement an EC2 status check. It's done from the EC2 dashboard. Go to instances, select your instance, choose the status checks tab (next to instance description) Click on create status check alarm The default "Status Check Failed (any)" should be good. I always set the interval to greater than one so I don't get bothered for transient issues.

It's also possible to set EC2 to automatically recover your instance if it goes down for some reason.

I also recommend a secondary monitoring system. Dumb is good for this one. I set up the linux utility mon pointed at my webserver from another host. If it fails to get a 200 response code twice in a row I get an email.

jorfus
  • 745
  • 7
  • 14
2

You can create an Alarm in Cloudwatch and set the alarm to notify you when it goes into "Insufficient Data" state. Most of the already available metrics are from the VM Host, which doesn't have any real idea about what's happening inside your machine.

At a start, I'd recommend installing the Amazon tools in your instance and set up a script to report something, (Anything: CPU usage, whatever) and alarm if that metric stops sending data (So the metric goes into the Insufficient Data state).

This is only a bare minimum, but should be a good place to start.

See the monitoring scripts section of Cloudwatch developer guide: http://docs.amazonwebservices.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts.html

Matt Connolly
  • 157
  • 1
  • 6
2

You can use Route 53 and its "Health checks". With this, you can send SNS alerts and also redirect your users to another secondary website or an error screen. I think this is better solution for your problem then Cloudwatch.

Petr
  • 121
  • 2
1

If you want to monitor HTTP endpoints, such as your API or Web site, check out my blog post on how to achieve that with Route 53 Health Checks (even if you don't use Route 53 for DNS):

http://eladnava.com/monitoring-http-health-email-alerts-aws/

Elad Nava
  • 293
  • 3
  • 10