2

I'm developing a web application (django/gunicorn/nginx) that needs to be scaled out according to load. The app will be hosted on Linodes so I intend to use StackScripts (and maybe Puppet) to start up new instances of the web server, then stick them behind a NodeBalancer.

It looks as if Nagios and Munin will allow me to monitor load, get alerts when the server is under pressure, and view some pretty graphs. But will those applications also allow me to automate the deployment of a new Linode? It looks as if I should be able to write a Nagios event handler that launches the StackScript. But I'm not sure it's possible to create a check that can determine when to start new instance.

  • Is it possible to set up an alert that takes past measurements into account? My criteria will be based on how long a machine is under load, rather than an instantaneous reading. I'm not worried if the web server is close to max usage for one check but I may be if it stays that way for two or more.
  • Am I missing a piece? I'm thinking this is possible in a plugin that uses data already available to Nagios/Munin. But maybe I need to write a separate app or script that stores previous check values and does the comparisons.
  • Can anyone point me to an example of using Nagios to scale out an app? I've seen a few slideshows where people discuss scaling this way (usually on EC2) but no concrete examples.

Thanks.

Andrew
  • 21
  • 1
  • I think Nagios can do what you're looking for. I just skimmed this document: http://nagios.sourceforge.net/docs/3_0/eventhandlers.html – Publiccert Feb 06 '12 at 19:50
  • I know I can write an event handler to launch a StackScript. I don't know how I'd create the check necessary to set it off though. – Andrew Feb 06 '12 at 19:56
  • Did you read the next article about checks? I know for a fact what you want to do can be done and is very well documented throughout the link I gave you. – Publiccert Feb 06 '12 at 20:01
  • I read the documentation. It does not answer the three specific questions I asked. – Andrew Feb 06 '12 at 20:20

1 Answers1

0

You can set the Nagios load check to recheck multiple times over a set time period before firing the event handler and/or alert. If the load hits a critical threshold, recheck it once a minute for 10 minutes, and if it continues to show critical load, fire event handler/alert.

If doesn't really check the history, but looks to see if the state has changed from it's most recent check.

Craig
  • 1,354
  • 6
  • 14
  • And that "once a minute" check is set using the retry_interval option on the service? – Andrew Feb 07 '12 at 20:42
  • Yes. The default is number of minutes. You can change the 'units' to seconds, or hours, if you wish. – Craig Feb 07 '12 at 21:17