3

I am running a web app on a Tomcat server. There is a hard-to-detect problem within the server code that causes it to crash once or twice everyday. I will dig in to correct it when I have time. But until that day, in a problematic case restarting tomcat (/etc/init.d/tomcat7 restart) or basically rebooting the machine also seem pretty good solutions for now. I want to detect liveliness of server with wget instead of grep or something else because even though tomcat is running my service my be down.

wget localhost:8080/MyService/

outputs

--2012-12-04 14:10:20--  http://localhost:8080/MyService/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2777 (2.7K) [text/html]
Saving to: “index.html.3”

100%[======================================>] 2,777       --.-K/s   in 0s

2012-12-04 14:10:20 (223 MB/s) - “index.html.3” saved [2777/2777]

when my service is up. And outputs

Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:8080... failed: Connection refused.

or just stucks after saying

--2012-12-04 14:07:34--  http://localhost:8080/MyService/
Resolving localhost... 127.0.0.1
Connecting to localhost|127.0.0.1|:8080... connected.
HTTP request sent, awaiting response...

Can you offer me a shell script with a cron job or something else to do that. I prefer not to use cron if there is an alternative.

3 Answers3

1

Instead of scripting from scratch, I highly recommend using Monit. I found this page which gives you some basics, but I find the implementation here is a bit sloppy. So let me iron that out. This will explain how to set monit up in Ubuntu 12.04. First, install monit from the repository like so:

sudo aptitude install monit

Next, you want to adjust the mailserver settings so you can get e-mail alerts. Just open the monit config like this:

sudo nano /etc/monit/monitrc

Now look for the area with mailserver settings and insert this line:

set mailserver localhost

This is ruleset I use for Apache. First, create the config file:

sudo nano /etc/monit/conf.d/apache2.conf

check process apache with pidfile /var/run/apache2.pid
        start "/etc/init.d/apache2 start"
        stop  "/etc/init.d/apache2 stop"
        if failed host 127.0.0.1 port 80
                with timeout 15 seconds
        then restart
        if loadavg (1min) greater than 7
                for 5 cycles
        then restart
        alert my_email@server.host only on { timeout, nonexist, resource }

Then restart monit like so:

sudo service monit restart

That ruleset checks port 80 on the localhost address of 127.0.0.1 and if there is a 15 second timeout, the Apache service is restarted. I also have a load average rule connected to it that will check the load every minute and if it is above 7 for 5 cycles in a row, it will restart the apache service.

For Tomcat, adapting the rule on this page—as mentioned above— would look like this. First open a file for editing in the monit config directory like this:

/etc/monit.d/tomcat 

And place this ruleset in it:

check host tomcat with address localhost
            stop program = "/etc/init.d/tomcat stop"
            start program = "/etc/init.d/tomcat restart"
            if failed port 8080 and protocol http
            then start
            alert my_email@server.host only on { start, nonexist }

Then restart monit like so for those new rules to take:

sudo service monit restart

I would double-check the { start, nonexist } as I am just guessing now since I do not have a Tomcat setup to test with. But that should be good.

You can follow the monit log here:

sudo tail -f -n 200 /var/log/monit.log
Giacomo1968
  • 3,542
  • 27
  • 38
0

I hope you have already found to root cause for your problem and been able to fix it properly. In case you or someone else would need a solution for this, here is a try for an answer.

The thing here is that your service may sometimes 'hang', and the monitoring must also be able to catch it up. In the simple script below we place the wget status query to background, wait a few seconds and if it has not been able to retrieve status 200 from the service, restart it.

#!/bin/sh
# WARNING, UNTESTED CODE !

TMPFILE=`mktemp`
WAITTIME=15

# Run the test
wget localhost:8080/MyService/ -o $TMPFILE &
WGETPID=$!

# Wait few seconds and let the test finish
sleep $WAITTIME

if [ ! `grep "HTTP request sent" $TMPFILE |grep "200 OK"|wc -l` -gt 0 ]; then
    echo "The service did not return 200 in $WAITTIME seconds."
    echo "Restarting it."
    /etc/init.d/tomcat7 restart
fi

# Cleanup
rm $TMPFILE
kill $WGETPID

For scheduling, I really recommend cron for simplicity. Another choice would be to start this as a daemon, which would introduce unnecessary complexity, IMHO. Also some other (external) scheduler could be used, but I keep the cron simplest.

Hopefully this helps.

grassroot
  • 683
  • 5
  • 14
0

Monit is a good tool for this. It will monitor services or server status like Tomcat (or hard drive space etc) and it will restart them, send you an email etc according to what you put in a configuration file, being more powerful and flexible than a Bash script (which you may prefer for simplicity).

LinuxDevOps
  • 1,774
  • 9
  • 14