3

When my server gets into high load, a graceful restart of Apache seems to bring things back under control. So I set up monit, with this configuration:

set daemon 10
check system localhost
      if loadavg (1min) > 5 then exec "/etc/init.d/apache2 graceful"

So every 10 seconds, I poll the server load, and when it gets above 5, I gracefully restart Apache. However, that temporarily raises the load, so we get into a death spiral. What I want is for it to notice after 10 seconds that the load is 5 or more, and gracefully restart Apache, then wait for 5 minutes or so before checking that particular metric again.

Is there a way to do this with monit?

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
Ben Dilts
  • 10,535
  • 16
  • 54
  • 85

2 Answers2

2

It's not entirely within monit, but it's close enough

set daemon 10
check system localhost
  if loadavg (1min) > 5 then unmonitor
  if loadavg (1min) > 5 then exec "/etc/init.d/apache2 graceful"
  if loadavg (1min) > 5 then exec "python /scripts/remonitor.py"

Then you have a python script, like so:

import time, os

time.sleep(5*60)
os.system("monit monitor system")

So this will:
1. unmonitor "system" when it reaches too much load, to prevent the death spiral
2. restart apache gracefully
3. start the script that will re-monitor the "system" in 5 minutes

rfadams
  • 1,898
  • 2
  • 19
  • 21
0

What about

set daemon 10

set limits { programtimeout: 300 seconds }

check system localhost
   if loadavg (1min) > 5 then exec "/bin/sh -c '/etc/init.d/apache2 graceful && sleep 5m'"

or even

set daemon 10

check system localhost
   start program = "/bin/sh -c '/etc/init.d/apache2 graceful && sleep 5m'" with timeout 330 seconds
   if loadavg (1min) > 5 then start

I.e., just add the sleep 5m shell command after the command to restart Apache and add the appropriate timeout to the monitrc.

Alex Che
  • 6,659
  • 4
  • 44
  • 53