1

I have a python script that's being used as a plugin for NRPE. This script checks to see if a process if running on a virtual machine by doing an SSH one-liner with a "ps ax | grep process" attached. When executing the script manually, it works as expected and returns a single line of output for NRPE as well as a status based on whether or not the process is running.

When I attempt to run the command setup to execute this script (from my Nagios server), I instantly get the output "NRPE: Unable to read output", however when I run the script manually it takes about a second before it returns output. Other commands run just fine, so it would seem like NRPE needs to wait a second or two for output rather than instantly failing, but I've been unable to find any way of accomplishing this; any tips?

PS: The virtual machines are not accessible from anywhere other than the host machine, hence the need for the nrpe plugin to ssh from the host into the VM to check the process.

Bart De Vos
  • 17,911
  • 6
  • 63
  • 82
Rauffle
  • 407
  • 1
  • 5
  • 13
  • Would it not be possible to check the status of your process using SNMP instead? – Matthew Ife Nov 17 '11 at 23:15
  • No, nor can I use passive checks – Rauffle Nov 17 '11 at 23:20
  • 1
    Did you switch to `nagios` user before calling this plugin from Nagios server? – quanta Nov 18 '11 at 02:56
  • How did you fix this? I have the same problem. –  Jan 18 '12 at 16:06
  • @SamLambert - you can see the answer with the checkmark below. THats the answer that the user said helped them fix it. – Mark Henderson Jan 18 '12 at 19:31
  • @SamLambert I no longer recall exactly how I fixed this, but here are some things that helped me get on the right track: test everything as the 'nagios' user, adjust timeouts if needed, make sure there aren't two installs of NRPE on the system (on the system I was working on NRPE had been installed manually from a tarball, and via apt so the configuration file I was initially working on wasn't actually doing anything since the OTHER install's config is what was actually being used. This meant everything I tried at first was in vain until I realized I was using the ignored config file) – Rauffle Jan 25 '12 at 16:40

3 Answers3

2

You can change the timeout by following the instructions here or searching timeout in the nrpe documentation, although I don't think this is your issue, or you'd see an error like this:

CHECK_NRPE: Socket timeout after 270 seconds.

There is also probably a nagios plugin that will return the data you want that has been written already.

Tablemaker
  • 1,149
  • 1
  • 11
  • 23
bdashrad
  • 36
  • 2
  • I've pretty much memorized the nrpe documentation by now, but it doesn't mention anything along the lines of wait times. Unfortunately the last update was about 4.5 years ago, so it may be outdated (I've had a few problems with it already). The issue isn't check_nrpe timing out because it's failing to connect to the host, but rather it appears to be executing the plugin, not immediately seeing output, then returning the given error. – Rauffle Nov 17 '11 at 23:27
1

Why not use check_proc plugin?

On the virtual machine, define a command for your service in /etc/nagios/nrpe.cfg:

command[check_<service_name>]=/usr/lib64/nagios/plugins/check_procs -c 1:1 -C <service_name>

and from the Nagios server:

define service{
    use                     critical-service
    host_name               xx
    service_description     <service_name>
    check_command           check_nrpe!check_<service_name>
    event_handler           autostart_<service_name>!xx
    process_perf_data       0
    contact_groups          admin
}

a sample result:

# su - nagios -s /bin/bash
-bash-3.2$ /usr/local/nagios/libexec/check_nrpe -H xx -c check_<service_name>
PROCS OK: 1 process with command name '<service_name>'
quanta
  • 51,413
  • 19
  • 159
  • 217
  • The service isn't being run on the machine running NRPE, but on a virtual machine on that host. – Rauffle Nov 18 '11 at 16:56
0

I think the default timeout is around 10 seconds, so that's probably not it.

If you're using ssh to execute a check, remember that nrpe is probably running as user "nagios" (depending on install options). Does that user have the right keys and ssh options?

cjc
  • 24,916
  • 3
  • 51
  • 70