0

So I set up a simple script to send an email alert when a certain web service stops running.

It has a simple flow of:

test = $( curl [address] | grep [a certain string in response] | wc -l )
if [ $test -ne 1 ]; then 
  echo "there has been an error" | mail -s "Error" -t "[my-mail-address]"
fi

and in crontab it is set to do the check once every five minutes:

*/5 * * * * sh /path/to/script/

It was working well for a couple of days, but suddenly about ten minutes ago, almost hundred e-mails from the server were received simultaneously. It doesn't seem possible at all since there aren't even any loops in the script.

Syslog:

Jan 26 01:05:01 sv1 CRON[23310]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Jan 26 01:10:01 sv1 CRON[23815]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
Jan 26 01:12:12 sv1 kernel: [5962667.417178] [ 1106]     0  1106     5914      168      17        0             0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417250] [27493]     0 27493    14949      224      34        0             0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417252] [27939]     0 27939    14949      224      34        0             0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417254] [28436]     0 28436    14948      224      34        0             0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417256] [28943]     0 28943    14949      224      34        0             0 cron
Jan 26 01:12:12 sv1 kernel: [5962667.417258] [29408]     0 29408    14949      224      34        0             0 cron
...

* this continues for about 800+ lines with similar timestamp (until 01:12:24). The timestamp of these 800+ lines coincide with the simultaneous mails. It is odd as the cron is scheduled to run every 5 mins, hence the first 2 lines. The lines starting from 01:12:12 are the fishy ones.

Update:

Just brought the service down again and let cron and the script do their job. A single mail was sent.

As the test is a very simple true/false, I am struggling to figure out what kind of special circumstances would result in multiple mails being sent simultaneously.

Reuben L.
  • 111
  • 1
  • 7
  • why would you pipe some output to mail instead of dumping to stdout and letting cron send the mail? – AD7six Jan 25 '15 at 17:23
  • @AD7six E.g, `59 */6 * * * script.sh | mail -s "Subject of Mail" someother@address.com` http://www.nixtutor.com/linux/sending-email-alerts-through-cron/? – 030 Jan 25 '15 at 17:31
  • @AD7six Not sure if I'm making sense but the MAILTO is a different user from the person handling the API. And for this case, only the latter needs the alert. But for other cron issues, a different user needs the alert. – Reuben L. Jan 25 '15 at 17:31
  • @utrecht no. `man 5 crontab` and look for MAILTO. [example](https://gist.github.com/AD7six/f229c009a348780f2219). Does not explain the number of emails - logically if there were 124 emails received and this is the only script sending emails - it was called 124 times. 123 not via cron. – AD7six Jan 25 '15 at 17:37
  • @ReubenL. Could you check the cron and syslog? `tail -f /var/log/cron`. What time did you receive 124 emails? Could you checks the log around that time? – 030 Jan 25 '15 at 17:41
  • I realised something interesting, the emails were sent at 1:12 AM server time. Which is odd, as cron should have run at 1:10 AM and since the script is very simple, should not take more than a few seconds. – Reuben L. Jan 25 '15 at 17:44
  • i've updated the question with the dump. the cron executes as expected at 1:05 and 1:10 but at 1:12, something odd happens (800+ lines dumped). – Reuben L. Jan 25 '15 at 17:49
  • Problems with mail delivery? Check the email timstamps and your mailserver logs. – dmourati Jan 25 '15 at 17:53
  • Mail logs and headers were consistent in terms of time. I'm now curious about the 800+ lines in the cron log – Reuben L. Jan 25 '15 at 17:57
  • If during the test, the response from the API takes a long time, that should still not cause such an issue right? curl is not async here as far as i know. – Reuben L. Jan 25 '15 at 17:59
  • @ReubenL. `the response from the API takes a long time`. Is this caused by the API or the grep command? The grep command time could be shortened (see updated answer) – 030 Jan 25 '15 at 22:27
  • change `echo "there has been an error"` to `echo "there has been an error at \`date +'%d-%m-%y %H:%M'\`"` (or whatever format you like) to avoid such problems in future. – Nehal Dattani Jan 26 '15 at 09:37

1 Answers1

1

Are you sure It was working well for a couple of days....? This means that a mail was sent every 5 minutes.

It could be posssible that the mails could not be sent for some reason resulting in a queue and when the connectivity issue was solved that all the mails were sent. In order to find the problem the mail-log should be checked.

The cron should be debugged. Check the syslog and the cronlog:

sudo less /var/log/cron

Some information regarding the cron should be found at the time the 124 mails were sent.

Also check this Q&A. If a system is too busy cron jobs could be summed up and then a daemon should be considered.

Check the output of curl [address] | grep [a certain string in response] | wc -l. Does it take a long time before the command has been executed? Why do you grep all similarities? The first hit should be sufficient. | head -1 could be used.

030
  • 5,901
  • 13
  • 68
  • 110
  • As for working well, yes, the service is used by a mobile application and we have been keeping tabs on it (on top of the script). Mail logs showed nothing out of the ordinary. No network issues, until the fateful moment when 124 mails were sent in quick succession. – Reuben L. Jan 25 '15 at 17:04
  • 1
    What does the mail headers say? If they where queued up, they will still show five minute intervals for their date headers. – Sven Jan 25 '15 at 17:08
  • The mail headers appear normal as well. Date and times are consistent and no sign of the mails being queued. – Reuben L. Jan 25 '15 at 17:16
  • The test is a simple POST to check if a API is running correctly. – Reuben L. Jan 25 '15 at 17:17
  • 1
    @ReubenL. Please specify it. Which command do you issue? Does the script contain other content or just the content as described in the question? Did it occur once that 124 emails were sent instead of 1 or occurs it more often? – 030 Jan 25 '15 at 17:18
  • Just a `curl [address] | grep [a certain string in response] | wc -l` – Reuben L. Jan 25 '15 at 17:22
  • When we were testing the script 2 days back (by bring down the service), it sent single mails as expected. This is the first time multiple mails have been sent. – Reuben L. Jan 25 '15 at 17:24