3

I have the following service set up for nagios:

define service {
  hostgroup_name             LNX
  service_description        /tmp Disk Usage
  check_command              check_nrpe!check_disk!-a '-w 20% -c 10% -p /tmp'
  check_interval             1
  max_check_attempts         3
  retry_interval             1
  check_period               24x7
  notification_interval      2
  notification_period        24x7
  notification_options       c,r,w
  notifications_enabled      0
  contact_groups             devops
}

Which ties to the following command:

define command {
 command_name     check_nrpe
 command_line     $USER1$/check_nrpe -H $HOSTADDRESS$ -u -t 60 -c $ARG1$ $ARG2$
}

So in the end what's being executed (and its output when run on command line) is:

$: /usr/local/nagios/libexec/check_nrpe -H <my host> -u -t 60 -c check_disk -a '-w 20% -c 10% -p /tmp'
DISK OK - free space: /tmp 4785 MB (97% inode=99%);| /tmp=124MB;3928;4419;0;4910

Following this with echo $? yields a 0, meaning OK/success.

However, nagios is reporting this as "error code 255 out of bounds" and I'm not sure why.

Running the check_disk command on the server works fine:

$: ./check_disk -w 20% -c 10% -p /tmp
DISK OK - free space: /tmp 4785 MB (97% inode=99%);| /tmp=124MB;3928;4419;0;4910
$: echo $?
0

And as shown above, it works when done through the check_nrpe executable on the nagios server. This means:

  1. The command (check_disk) is present on the remote system: command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
  2. The nagios server is able to talk to the remote nrpe (e.g. it can access it on the network and its IP is present in the only_from directive in /etc/xinetd.d/nrpe)

Additionally, this check runs fine on other machines, but not all machines

Why does Nagios think it's getting a 255 when everything I can see means it should be getting 0 and thus marking the service as OK?

EDIT: Nagios version is Nagios core 4 running on CentOS 7, hosts being checked are centos 5-7, the problem appears on multiple machines of varying versions

Mitch
  • 131
  • 1
  • 1
  • 5
  • What about `sudo -u nagios /usr/local/libexec/nagios/check_nrpe ...` ? – user4556274 Aug 05 '16 at 18:53
  • @user4556274 all of the commands above were run as the nagios user (using `sudo su` from my user), but trying `sudo -u nagios...` from my user yields the same successful result – Mitch Aug 05 '16 at 18:56
  • 2
    You have not provided the source host OS and the destination OS distros and versions and the version of Nagios you are running. – mdpc Aug 05 '16 at 20:29
  • @mdpc updated at the end of the question – Mitch Aug 08 '16 at 14:24
  • Is SELinux enabled on the system, and if so, have you run the command in the `nagios` user context with `runcon`? `sudo` or `su` by themselves would not respect SELinux context. – user4556274 Aug 08 '16 at 15:19
  • Have you tried this [link](https://paulferrett.com/2011/nagios-return-code-of-255-is-out-of-bounds-for-disk-check/) before? Seems like worth trying. – Simon MC. Cheng Aug 08 '16 at 17:15
  • @SimonMC.Cheng thanks for the link, I had seen that link before, but my Nagios configuration uses NRPE instead of SSH to do the checks – Mitch Aug 08 '16 at 18:31

2 Answers2

2

When you have check_command as follow:

check_command check_nrpe!check_disk

The command name tied is actually check_disk instead of check_nrpe at client side.

Cause of problem

The service setting in Nagios server request the monitored client to execute check_disk command with ONE arguments.

-w 20% -c 10% -p /tmp

Your current setting for check_disk command with on Nagios client at nrpe.cfg is as shown:

command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

So the command you passed to monitored client via NRPE is actually:

/usr/lib64/nagios/plugins/check_disk -w -w 20% -c 10% -p /tmp -c $ARG2$ -p $ARG3$

Therefore, the test is failed because the command cannot be successfully executed.

Solution

If you want to pass 3 different arguments to Nagios client, try to modify your check_command as follow:

check_command check_nrpe!check_disk -a '-w 20% -c 10% -p /tmp'

Make sure you have the corresponding command configured at Nagios client:

command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$

Another option would be changing the server configuration as follow:

check_command check_nrpe!check_disk

With corresponding client configuration:

command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /tmp
Simon MC. Cheng
  • 436
  • 2
  • 7
0

You should check that the nrpe client is accepting connections from the nagios server.

cat /etc/xinetd.d/nrpe
service nrpe
{
    flags           = REUSE
    socket_type     = stream
    port            = 5666
    wait            = no
    user            = nagios
    group           = nagios
    server          = /usr/local/nagios/bin/nrpe
    server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd
    log_on_failure  += USERID
    disable         = no
    only_from       = xxx.xxx.xxx.xxx
}

Confirm the only_from line has your nagios IP

Second you want to check to make sure the check_nrpe script has the right permissions. It should be owned by nagios:nagios

-rwxrwxr-x. 1 nagios nagios 81542 Jul 11 13:08 /usr/local/nagios/libexec/check_nrpe
xguru
  • 193
  • 1
  • 7
  • The `only_from` line does have the IP, otherwise I wouldn't be able to run it from the nagios server on the command line. Nagios user does own `check_disk` on the remote host. – Mitch Aug 05 '16 at 18:50
  • Do you have any other nrpe checks that do work for this remote host? If not reinstall nrpe to confirm nrpe itself is actually working. – xguru Aug 08 '16 at 19:12