0

I'm monitoring several hosts using Nagios. This works fine when I use "normal" checks that are executed on the monitoring host (say, check_http). However, I'm having troubles with NRPE-based checks which are executed through the NRPE-service on the monitored host instead.

I have declared my custom commands in the NRPE-configuration of the monitored hosts, e.g.

command[check_memory]=/usr/lib/nagios/plugins/check_memory -w 20% -c 10% -u G

I've then created the corresponding Nagios commands in the Nagios configuration on the monitoring host:

define command {
    command_name my_check_nrpe
    command_line /usr/lib/nagios/plugins/check_nrpe -H '$HOSTALIAS$' -c '$ARG1$'
}

define service {
  use                   my-service
  service_description   Free memory
  check_command         my_check_nrpe!check_memory
  check_interval        15
}

These checks work fine when I run them manually on the monitoring host using the nagios user (which the nagios service runs under):

nagios@monitor:~$ /usr/lib/nagios/plugins/check_nrpe -H 'my.target.host' -c 'check_memory'
MEMORY OK - 0G free | free=956080128b;419844915.2:;209922457.6:

However, I continuously receive email warnings from Nagios about the service:

***** Nagios  *****

Notification Type: PROBLEM

Service: Free memory
Host: my.target.host
Address: XXX.XXX.XXX.XXX
State: WARNING

Date/Time: $

Additional Info:

$

I haven't managed to get any more details about the warnings. The Nagios logs on the monitoring host only show that the warnings were sent:

[1500623961] SERVICE NOTIFICATION: my-mailbox;my.target.host;Free memory;WARNING;notify-by-email;(null)
[1500627561] SERVICE NOTIFICATION: my-mailbox;my.target.host;Free memory;WARNING;notify-by-email;(null)

I've also activated maximum debugging output for Nagios:

debug_level=-1
debug_verbosity=2

However, /var/lib/nagios3/nagios.debug doesn't contain anything of interest:

[1500630464.420189] [064.1] [pid=21171] Making callbacks (type 9)...
[1500630464.420243] [064.1] [pid=21171] Making callbacks (type 9)...
[1500630464.420308] [064.1] [pid=21171] Making callbacks (type 9)...
[1500630464.420389] [064.1] [pid=21171] Making callbacks (type 9)...
[1500630464.421086] [064.1] [pid=21171] Making callbacks (type 7)...
[1500630464.421767] [064.1] [pid=21174] Making callbacks (type 9)...

Similarly, I've enabled debugging output for the NRPE service on the monitored hosts (debug=1) but the NRPE logs only tell me that my check_memory command has been added successfully.

I'm running NRPE 3.0.1-3 and Nagios 3.5.1.

How can I solve this issue or gather more information about the problem?

Florian Brucker
  • 224
  • 1
  • 2
  • 12

1 Answers1

0

It turns out that there was a duplicate Nagios process running on the monitoring server which wasn't affected by restarting the service and therefore kept using an old, buggy version of the configuration. While we can't reconstruct how we ended up with two Nagios processes, killing the duplicate one solved the problem.

Florian Brucker
  • 224
  • 1
  • 2
  • 12