0

I need to be informed by nagios when a process on a remote server is restarted.

The only thing I do not know how to do is to check its process state, and what way to do it?

I got in remote server this nrpe command for now: ./check_procs -c 1: -a "/usr/local/yyyprogram/sbin/XXXdaemon" -s Sl but this process must work all the time, has own mechanism to restart, and this is the only thing I need to know - when exactly it restarts. What state of process should I add here, and what way - example -s SlRD is ok? or -s Sl -s R -s D is ok? Maybe I can do it other way to have this kind of information: OK|WARNING|UNKNOWN|CRITICAL ? The only status OK for me is OK (means working).

Also, how to monitor it from other nagios server, should I check this every one second? When service restarts, I can be notified one or two minutes later, but how to know it happened without checking logs? PID of this service after restart mechanism is different from before.

How can I be sure that all status is included in line of nrpe command config?

Please help:)

EDIT

root@server:/usr/local/nagios/libexec# ./check_procs -vv -a "/usr/local/yyyprogram/sbin/xxxdaemon"
CMD: /usr/bin/ps axwwo 'stat uid pid ppid vsz rss pcpu cgroup:256 comm args'
Matched: uid=0 vsz=9412 rss=2804 pid=517515 ppid=1 jid=0 pcpu=0.20 stat=Sl etime= prog=xxxdaemon args=/usr/local/yyyprogram/sbin/xxxdaemon -d /usr/local/yyyprogram/conf -b
 cgroup_hierarchy=(null)
Kamil Bu
  • 9
  • 4

1 Answers1

0

First and foremost, if you are interested in how long a process has been running, check_procs does not offer that functionality as far as I can see from the -h flag, so I'm not sure why you are assuming it does. Or is that not what you're trying to check?

If you want to check how long a process has been running for, you don't need a plugin for it. This example grabs the PID of netdata, gives you etimes, greps to only show the number and uses xargs to remove extra spaces around the number:

$ ps -p $(pidof /usr/sbin/netdata) -o etimes | grep -E "[1-9].*" | xargs
65805

$ systemctl restart netdata

$ ps -p $(pidof /usr/sbin/netdata) -o etimes | grep -E "[1-9].*" | xargs
10

All you have to do is write a shell script that checks if the value is below a certain number, if there's a problem exit 1, then run that script over NRPE from Nagios.

pzkpfw
  • 318
  • 2
  • 12
  • No, i do not need information how long it was runnig. I need to know, and be informed by nagios that it has restarted. And i do not know what flags should i monitor and what way to get this information. – Kamil Bu Mar 18 '22 at 17:15
  • checking the etimes would tell you if it has restarted, and I just told you how to check it. In what way does this not answer your question? What have you tried so far? – pzkpfw Mar 19 '22 at 20:33