4

I have a problem with my nagios monitoring. I'm trying to check a process on a remote host using nrpe.

The host was already been monitored by nagios, so I only needed to add a line to the nrpe.cfg file. There was even already a check_procs check defined so I could use that example.

So simple you might thick but no. I check if I could run the command manually and no problems there!

ubuntu@host:/usr/lib/nagios/plugins$ ./check_procs -w 1:1 -c 1:1 -a delayed_job
PROCS OK: 1 process with args 'delayed_job'
ubuntu@host:/usr/lib/nagios/plugins$ sudo ./check_procs -w 1:1 -c 1:1 -a delayed_job
PROCS OK: 1 process with args 'delayed_job'

This is a piece my nrpe.cfg file:

command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
command[check_proc_name]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -a $ARG3$

The first check check_procs is being monitored correctly, the last line was mine I added but responds with this : PROCS CRITICAL: 2 processes with args 'delayed_job'

This is my service file :

define service {
  use                 generic-service
  host_name           imobiel.limburger.nl
  service_description Check Delayed Job Proces
  check_command       check_nrpe!check_proc_name!1:1 1:1 delayed_job
}

Anybody an idea ? I restarted nagios server and the nrpe server several times already! Hopefully someone expienced the same issues?

Thanks a lot in advance.

Niels
  • 821
  • 3
  • 9
  • 8
  • I changed the configuration a little bit like this: command[check_proc_name]=/usr/lib/nagios/plugins/check_procs -w 1:1 -c 1:1 -a delayed_job and removed the parameters from the check_nrpe command, now nagios returns status: unkown. Saying it didn't received a response from nrpe will the other nrpe checks on the server are fine! Anybody know another approach? – Niels Feb 15 '12 at 08:44
  • Is the remote system a Solaris box, by chance? – Keith Feb 20 '12 at 19:39
  • No sorry it's a debian based distro – Niels Mar 20 '12 at 19:18

4 Answers4

4

I think it's a bug in check_procs: it seems to pick up itself when running from nrpe, as opposed to the command line. Maybe a race condition of some sort.

I see you're running Ubuntu, given your command prompt. With the check_procs available on Lucid, I'm able to do something like:

/usr/lib/nagios/plugins/check_procs --ereg-argument-array="[s]tring" -w 1:1

So, we'll do a pattern match on the argument list, but in such a way that the check_procs process won't be matched. Putting one character of the pattern into square brackets causes the expression to match on "string" but it obviously won't match on the check_procs argument "[s]tring".

The check_procs available on Hardy doesn't have the regex option, though.

cjc
  • 24,916
  • 3
  • 51
  • 70
  • This is absolutely not a bug in check_procs; if you look at the source, and find the line under "Ignore self", you can see that it explicitly... ignores itself. – Keith Feb 14 '12 at 18:54
  • Yes, I see the source and I see the "Ignore self" section. And yet, with version 1.4.14, I see the same behavior the OP is seeing: when running check_procs directly, it's correct in seeing 1 proc, but if it's run via nrpe it finds 2 procs. And I haven't duplicated checks in my nrpe configs, like the OP has done. – cjc Feb 14 '12 at 19:46
  • @cjc Do you have the same problem as me then? Did you found some answers? – Niels Feb 15 '12 at 08:49
  • @user1132127, my solution was to use the --ereg-argument-array as I have above, basically use the pattern like described in, say, this: http://linuxcommando.blogspot.com/2008/03/trick-grep-not-to-report-itself-in.html – cjc Feb 15 '12 at 09:48
  • Same bug on check_procs v1.4.15 (nagios-plugins 1.4.15). Works with --ereg-argument-array. – w00t Apr 26 '15 at 12:33
3

There is problem with a /bin/ps output on host. By default, "check_procs" binary do a "/bin/ps -axwo" on checked system, which cut arg string. Just recompile nagios-plugins from source. On version 1.4.15 must set configure options:

./configure --enable-extra-opts=yes --with-ps-command="/bin/ps -axwwo 'stat uid pid ppid vsz rss pcpu ucomm command'" --with-ps-format="%s %d %d %d %d %d %f %s %n" --with-ps-cols=9 --with-ps-varlist="procstat,&procuid,&procpid,&procppid,&procvsz,&procrss,&procpcpu,procprog,&pos"

When we set -axWWo - ps get us full string of arguments. Sorry for my ingreesh.

DeepSpirit
  • 31
  • 2
0

What version of nagios-plugins do you have? check_procs in 1.4.15 does not exhibit this behavior. I have not checked earlier versions, though.

Add "-vv" or "-vvv" to the end of your manual test, and you can verify exactly what 'ps' command it is running, what it is seeing, and what it considers matching.

If you're actually having Nagios run both "check_procs" and "check_proc_name" as they are defined in your nrpe.cfg snippet, at the same time, with the same exact args... then it's quite possible that they would pick up each other in the count. But why would you be running two checks that do exactly the same thing?

Keith
  • 4,637
  • 15
  • 25
0

I actually had this problem too, for me defining the user worked. For instance -u root. since the NRPE server runs it's check as the Nagios user, specifically calling out the user avoids this problem.