0

I have an HPC cluster and I would like to monitor its health with Icinga2. I have a number of checks defined for each node in the cluster, but what I would really like is to get a notification if more than a certain percentage of the nodes are sick.

I notice that is possible to define a dummy host which represents the cluster and use the Icinga domain specific language to achieve something like I'm interested (http://docs.icinga.org/icinga2/latest/doc/module/icinga2/chapter/advanced-topics?highlight-search=up_count#access-object-attributes-at-runtime). However this seems like an inelegant and awkward solution.

Is it possible to define this kind of "aggregate" or "meta check" over a hostgroup?

1 Answers1

0

There wasn't any solution, and such a thing put inside the docs helped quite a few users, even if it isn't that elegant. External addons such as business process can do the same but require additional configuration. The Vagrant box integrates the Icinga Web 2 module for instance.

Other users tend to use check_multi or check_cluster for that. Isn't that elegant either.

There are no immediate plans to implement such a feature although the idea is good and lasts long.

dnsmichi
  • 466
  • 3
  • 11