2

I'd like to use Nagios to monitor the redundant PSUs in my servers (running Debian Wheezy).

I've run the sensors-detect script in the lm-sensors package, and the only thing it can find is

Driver `ipmisensors':
  * ISA bus, address 0xca2
    Chip `IPMI BMC KCS' (confidence: 8)

I then installed freeipmi-tools, and I find that I can get some useful output from ipmi-sensors:

$ sudo ipmi-sensors --group='Power Supply'
5: Power Supply 1 (Power Supply): [Presence detected]
6: Power Supply 2 (Power Supply): [Presence detected]
7: Power Supplies (Power Supply): [Fully Redundant]

I can write a Nagios plugin to run ipmi-sensors locally, parse its output, and alert if it changes, but I'm reluctant to rely on the output format staying the same, and I can't figure out how to get more machine-readable output.

I've looked at check_ipmi_sensor, but it seems only to operate where the IPMI device is available on the network; mine is not.

Is there a better way than parsing the output of ipmi-sensors?

Flup
  • 7,978
  • 2
  • 32
  • 43
  • I'm not so familiar with Nagios, but I'd be really surprised if someone hasn't written a plugin or whatever it's called for local IPMI devices already. This is a common way to monitor hardware. – Chris S Aug 29 '14 at 14:46
  • Me too :) I suspect my -foo isn't good enough on a Friday afternoon. – Flup Aug 29 '14 at 14:49

2 Answers2

1

There are several other plugins for IPMI listed in Nagios Exchange. This is (sometimes) a better place to start looking than Google.

For example:

Keith
  • 4,637
  • 15
  • 25
-1

There is no reason to parse the IPMI data. It takes a CPU thread to read and a thread to parse and if you are scaling to data center size systems, thousands of servers thats a lot of threads. Instead use an API, java(Vrx or Hemi) or C library(ipmitool or freeipmi) to access the IPMI data directly. Data Centers (40 k servers) can read 6 million IPMI sensors/minute and thread creation becomes the limiting factor.

The advantage an API is that IPMB bus wirte errors, as in the bus is busy or has a permement hardware error are reported and you can decide to retry retrieving the data.

Starfish
  • 2,735
  • 25
  • 28