How to find cause of EC2 micro instance regular read/write surge?

Question

Every day without fail, sometime between 06:20 and 07:00 UTC, I get two email alerts from EC2 warning me that my "High VolumeWriteBytes" and "High VolumeReadBytes" have exceeded my set threshold. I get a massive spike of 2,000,000 bytes for a few minutes, then it returns to almost zero for the rest of the day.

I am running just a very simple wordpress website on the server which gets very few visitors anyway and none at that time of the day apart from spiders. When I look in the apache log, there is nothing unusual at that time.

How can I go about tracking down the cause of this problem?

I am thinking to output the "top" command to a file during that period, but I am not sure how to write the cron script, and not sure if the top command will reveal anything anyway?

Do you have reporting like webalizer or similar? Those will write periodically. Also check your cron jobs for the time period to see if something runs during then. — Nathan C, Jun 07 '13 at 13:25
No reporting programs. I have 13 files in my /etc/cron.daily though. I have looked in /etc/crontab and found this: # m h dom mon dow user command 25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) So I think this suggests one of the crons is causing the problem, but the problem now is how to find which one. — z c, Jun 07 '13 at 13:35

score 1 · Accepted Answer · answered Jun 07 '13 at 13:33

1

Run atop as a daemon, logging every minute:

/usr/bin/atop -a -w /var/log/atop.log 60

Then use atop -r the next day to step through the logs, sorting by disk usage to see which process is responsible for the I/O surge.

answered Jun 07 '13 at 13:33

Flup

7,978
2
32
43

Thank you. I have run that command and will check the result tomorrow. What command do I type to end the daemon? – z c Jun 07 '13 at 13:58
You don't say which distro you're using, but the Debian/Ubuntu package `atop` will start the daemon automatically (you'll have to edit `/etc/init.d/atop` to change the recording interval if you want to do that). If you do have an init script, you can just `/etc/init.d/atop stop`. If not, just kill the process. My advice would be to leave it running always -- the data can be invaluable. – Flup Jun 07 '13 at 14:03
Awesome. I have changed the recording interval to 120 secs and restarted it, will check the result tomorrow. – z c Jun 07 '13 at 14:17
OK I have checked the result and found that cron is the culprit: PID RDDSK WRDSK WCANCL DSK CMD 1/5 722 3420K 4K 0K 98% cron But, this is vague. I have 13 items in my daily cron, so how do I go aout narrowing down which one is the problem? – z c Jun 08 '13 at 06:52

How to find cause of EC2 micro instance regular read/write surge?

1 Answers1