I want to use monit to kill a process that uses more than X% CPU for more than N seconds.
I'm using stress
to generate load to try a simple example.
My .monitrc:
check process stress
matching "stress.*"
if cpu usage > 95% for 2 cycles then stop
I start monit (I checked syntax with monit -t .monitrc
):
monit -c .monitrc -d 5
And I launch stress:
stress --cpu 1 --timeout 60
Stress shows in top
as using 100 %CPU.
I'd expect monit to kill stress in about 10 seconds, but stress completes successfully. What am I doing wrong?
I also tried monit procmatch "stress.*"
, which shows two matches for some reason. Maybe that's relevant?
List of processes matching pattern "stress.*":
stress --cpu 1 --timeout 60
stress --cpu 1 --timeout 60
Total matches: 2
WARNING: multiple processes matched the pattern. The check is FIRST-MATCH based, please refine the pattern
EDIT: Tried e.lopez's method
I had to remove the start statement from .monitrc because it was causing a error in monit ('stress' failed to start (exit status -1) -- Program /usr/bin/stress timed out
and then a zombie process).
So launched stress manually:
stress -c 1
stress: info: [8504] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
The .monitrc:
set daemon 5
check process stress
matching "stress.*"
stop program = "/usr/bin/pkill stress"
if cpu > 5% for 2 cycles then stop
Launched monit:
monit -Iv -c .monitrc
Starting Monit 5.11 daemon
'xps13' Monit started
'stress' process is running with pid 8504
'stress' zombie check succeeded [status_flag=0000]
'stress' cpu usage check skipped (initializing)
'stress'
'stress' process is running with pid 8504
'stress' zombie check succeeded [status_flag=0000]
'stress' cpu usage check succeeded [current cpu usage=0.0%]
'stress' process is running with pid 8504
'stress' zombie check succeeded [status_flag=0000]
'stress' cpu usage check succeeded [current cpu usage=0.0%]
'stress' process is not running
'stress' trying to restart
'stress' start skipped -- method not defined
Monit sees the right process (pids match), but sees 0% usage (stress is using 1 cpu at 100% per top). I killed stress manually, which is when monit says the process is not running (at the end, above). So, monit is monitoring the process fine, but isn't seeing the right cpu usage.
Any ideas?