I'm running a server on EC2 (Cent OS, 2.6.35.14, x86_64), and I recently went over my 1 million I/Os per month quota, which is absurd, as my disk utilization shouldn't be anywhere near that. (I'm not running any services that should require much disk access.... or even any disk access)
My first thought was to fire up iotop, and see if there were any processes continually writing out to disk. iotop showed me that a process named jbd2 was writing out to disk more than once a minute.
After some googling around, it appeared that the problem is either a kernel bug, or some daemon touching the disk regularly.
I installed inotify-tools and started an inotifywait on the entire filesystem;
sudo /usr/local/bin/inotifywait -m -r /!(dev|proc)
And what this showed is that /etc/passwd is being opened, accessed, and then closed, multiple times a minute. Without me doing anything else on the system! inotify doesn't tell you what process is doing the touching, but I installed audit (http://people.redhat.com/sgrubb/audit/), setup some logging on /etc/passwd and let it run for a little while, and then took a look at the logs, but all they tell me is that it's being accessed by sudo: (username redacted)
type=SYSCALL msg=audit(10/03/2011 17:48:30.493:260) : arch=x86_64 syscall=open success=yes exit=4 a0=7f809205669a a1=80000 a2=1b6 a3=0 items=1 ppid=6466 pid=6467 auid=**** uid=root gid=**** euid=root suid=root fsuid=root egid=**** sgid=**** fsgid=**** tty=pts0 ses=79 comm=sudo exe=/usr/bin/sudo key=(null)
type=SYSCALL msg=audit(10/03/2011 17:48:30.493:261) : arch=x86_64 syscall=open success=yes exit=4 a0=7f809205669a a1=80000 a2=1b6 a3=0 items=1 ppid=6466 pid=6467 auid=**** uid=root gid=**** euid=root suid=root fsuid=root egid=**** sgid=**** fsgid=**** tty=pts0 ses=79 comm=sudo exe=/usr/bin/sudo key=(null)
type=SYSCALL msg=audit(10/03/2011 17:48:30.488:256) : arch=x86_64 syscall=open success=yes exit=3 a0=7f809205669a a1=80000 a2=1b6 a3=0 items=1 ppid=6441 pid=6466 auid=**** uid=**** gid=**** euid=root suid=root fsuid=root egid=**** sgid=**** fsgid=**** tty=pts0 ses=79 comm=sudo exe=/usr/bin/sudo key=(null)
...
Is there any way to narrow this down further? I have nothing in my cron files, and no other services that I can think of that would be causing this:
chkconfig | grep on
atd 0:off 1:off 2:off 3:on 4:on 5:on 6:off
cgconfig 0:off 1:off 2:off 3:off 4:off 5:off 6:off
cloud-init 0:off 1:off 2:on 3:on 4:on 5:on 6:off
cloud-init-user-scripts 0:off 1:off 2:on 3:on 4:on 5:on 6:off
conman 0:off 1:off 2:off 3:off 4:off 5:off 6:off
crond 0:off 1:off 2:on 3:on 4:on 5:on 6:off
iptables 0:off 1:off 2:on 3:on 4:on 5:on 6:off
irqbalance 0:off 1:off 2:off 3:on 4:on 5:on 6:off
lvm2-monitor 0:off 1:on 2:on 3:on 4:on 5:on 6:off
mdmonitor 0:off 1:off 2:on 3:on 4:on 5:on 6:off
netconsole 0:off 1:off 2:off 3:off 4:off 5:off 6:off
netfs 0:off 1:off 2:off 3:on 4:on 5:on 6:off
network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
ntpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
restorecond 0:off 1:off 2:off 3:off 4:off 5:off 6:off
rsyslog 0:off 1:off 2:on 3:on 4:on 5:on 6:off
sendmail 0:off 1:off 2:on 3:on 4:on 5:on 6:off
sshd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
udev-post 0:off 1:on 2:on 3:on 4:on 5:on 6:off
yum-updatesd 0:off 1:off 2:on 3:on 4:on 5:on 6:off