7

Many of you probably have completed or are contemplating Green-IT projects with the goal to power off idle or unneeded systems when demand for computer resources is low:

How you did handle this situation in your system monitoring? I'm especially interested in solutions for Nagios.

One idea is to schedule downtime in Nagios for the poweroff hosts. However, the drawback of this solution is that the hosts would still be listed in the 'Problems' view of the Nagios web interfaces. Is there a better solution without this "pollution" (i.e. were the 'Problems' view only shows real problems that require maintenance from a system administrator).

A clean solution would be a new 'Green-IT poweroff' host state. But AFAIK this does not exist, does it? Do you have any other recommendations or solutions? What's the best way to monitor a dynamic IT environment?

knweiss
  • 4,015
  • 24
  • 20
  • Have you tried using a network control system that - ah - supportst that? I think it would be something nagios would have to deal with, or have to document (maybe a script to turn them off for monitoring)? – TomTom Feb 16 '12 at 13:21
  • Maintenance hosts are already flagged with a unique icon along with notifications being turned off. Its not clear to me what more you are looking for. – uSlackr Feb 16 '12 at 13:30
  • 2
    @uSlackr I don't want to see dozens/hundreds hosts in the Problems view - even if they don't send notifications. – knweiss Feb 16 '12 at 21:55
  • Regarding the 'Problems' view issue: If we decide to restrict ourselves to only use the 'Unhandled' services/hosts view it's possible to hide the poweroff hosts by simply acknowledging them e.g. with 'Green-IT'. – knweiss Feb 20 '12 at 10:07

2 Answers2

2

The easy way:

There are built-in filters for the status view, at the top of the page. You can just have the admins watch "unacknowledged" problems, or problems on hosts that are not in scheduled downtime. Or any other number of combinations.

If you really want to go wild with filtering the CGI view, see the "HOST AND SERVICE FILTER PROPERTIES" section of cgiutils.h in the source code for a full list of filters that are available.

The hard way:

See the docs on adaptive monitoring. With this, you can change the nagios conf, on the fly, as systems are automatically powered off/on. For example, you can adjust the check periods, change the check commands to a check_dummy variant, enable/disable event handlers, etc.

Keith
  • 4,637
  • 15
  • 25
  • Oh, I forgot to mention: the filters are a bit-field, if that isn't obvious. So, to apply multiple filters, you have to add together their values. – Keith Feb 21 '12 at 17:03
0

I think you need a bit of custom development to create a new status view that removes hosts with scheduled downtime from the list of problem servers. I suspect someone in the nagios dev community would be available to do this for a fee.

uSlackr
  • 6,412
  • 21
  • 37