I'm looking at monitoring and alerting for some of our business metrics in a web app (pageviews, signups, etc). We already use Nagios and Munin for a wide variety of server monitoring and alerting cases, which is why I've started there.
I could write custom plugins for Nagios that calculate our statistics / control charts and check for when these metrics dip below desirable levels (Warning and Critical), but I'd also like to know when these metrics spike above the expected levels (lots more signups - we did something right!).
Is there a way to create custom alert levels in Nagios or Munin to accomodate these positive alerts, or is there another tool I should be looking at to solve for this case? The ideal tool would:
- Include more alert levels (Critical, Warning, OK, Improved, Spiking)
- Allow me to see additional data about the report that generated the alert (expected value of the metric and observed value)
- (Nice to have) allow me to graph the history of the metric so I can visualize the observations after receiving the alert