So I set up a simple script to send an email alert when a certain web service stops running.
It has a simple flow of:
test = $( curl [address] | grep [a certain string in response] | wc -l )
if [ $test -ne 1 ]; then
echo "there has been an error" | mail -s "Error" -t "[my-mail-address]"
fi
and in crontab it is set to do the check once every five minutes:
*/5 * * * * sh /path/to/script/
It was working well for a couple of days, but suddenly about ten minutes ago, almost hundred e-mails from the server were received simultaneously. It doesn't seem possible at all since there aren't even any loops in the script.
Syslog:
Jan 26 01:05:01 sv1 CRON[23310]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Jan 26 01:10:01 sv1 CRON[23815]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Jan 26 01:12:12 sv1 kernel: [5962667.417178] [ 1106] 0 1106 5914 168 17 0 0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417250] [27493] 0 27493 14949 224 34 0 0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417252] [27939] 0 27939 14949 224 34 0 0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417254] [28436] 0 28436 14948 224 34 0 0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417256] [28943] 0 28943 14949 224 34 0 0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417258] [29408] 0 29408 14949 224 34 0 0 cron
...
* this continues for about 800+ lines with similar timestamp (until 01:12:24). The timestamp of these 800+ lines coincide with the simultaneous mails. It is odd as the cron is scheduled to run every 5 mins, hence the first 2 lines. The lines starting from 01:12:12 are the fishy ones.
Update:
Just brought the service down again and let cron and the script do their job. A single mail was sent.
As the test is a very simple true/false, I am struggling to figure out what kind of special circumstances would result in multiple mails being sent simultaneously.