1

Every day without fail, sometime between 06:20 and 07:00 UTC, I get two email alerts from EC2 warning me that my "High VolumeWriteBytes" and "High VolumeReadBytes" have exceeded my set threshold. I get a massive spike of 2,000,000 bytes for a few minutes, then it returns to almost zero for the rest of the day.

I am running just a very simple wordpress website on the server which gets very few visitors anyway and none at that time of the day apart from spiders. When I look in the apache log, there is nothing unusual at that time.

How can I go about tracking down the cause of this problem?

I am thinking to output the "top" command to a file during that period, but I am not sure how to write the cron script, and not sure if the top command will reveal anything anyway?

z c
  • 55
  • 1
  • 5
  • Do you have reporting like webalizer or similar? Those will write periodically. Also check your cron jobs for the time period to see if something runs during then. – Nathan C Jun 07 '13 at 13:25
  • No reporting programs. I have 13 files in my /etc/cron.daily though. I have looked in /etc/crontab and found this: # m h dom mon dow user command 25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) So I think this suggests one of the crons is causing the problem, but the problem now is how to find which one. – z c Jun 07 '13 at 13:35

1 Answers1

1

Run atop as a daemon, logging every minute:

/usr/bin/atop -a -w /var/log/atop.log 60

Then use atop -r the next day to step through the logs, sorting by disk usage to see which process is responsible for the I/O surge.

Flup
  • 7,978
  • 2
  • 32
  • 43
  • Thank you. I have run that command and will check the result tomorrow. What command do I type to end the daemon? – z c Jun 07 '13 at 13:58
  • You don't say which distro you're using, but the Debian/Ubuntu package `atop` will start the daemon automatically (you'll have to edit `/etc/init.d/atop` to change the recording interval if you want to do that). If you do have an init script, you can just `/etc/init.d/atop stop`. If not, just kill the process. My advice would be to leave it running always -- the data can be invaluable. – Flup Jun 07 '13 at 14:03
  • Awesome. I have changed the recording interval to 120 secs and restarted it, will check the result tomorrow. – z c Jun 07 '13 at 14:17
  • OK I have checked the result and found that cron is the culprit: PID RDDSK WRDSK WCANCL DSK CMD 1/5 722 3420K 4K 0K 98% cron But, this is vague. I have 13 items in my daily cron, so how do I go aout narrowing down which one is the problem? – z c Jun 08 '13 at 06:52