4

I'm running a server on EC2 (Cent OS, 2.6.35.14, x86_64), and I recently went over my 1 million I/Os per month quota, which is absurd, as my disk utilization shouldn't be anywhere near that. (I'm not running any services that should require much disk access.... or even any disk access)

My first thought was to fire up iotop, and see if there were any processes continually writing out to disk. iotop showed me that a process named jbd2 was writing out to disk more than once a minute.

After some googling around, it appeared that the problem is either a kernel bug, or some daemon touching the disk regularly.

I installed inotify-tools and started an inotifywait on the entire filesystem;

sudo /usr/local/bin/inotifywait -m -r /!(dev|proc)

And what this showed is that /etc/passwd is being opened, accessed, and then closed, multiple times a minute. Without me doing anything else on the system! inotify doesn't tell you what process is doing the touching, but I installed audit (http://people.redhat.com/sgrubb/audit/), setup some logging on /etc/passwd and let it run for a little while, and then took a look at the logs, but all they tell me is that it's being accessed by sudo: (username redacted)

type=SYSCALL msg=audit(10/03/2011 17:48:30.493:260) : arch=x86_64 syscall=open success=yes exit=4 a0=7f809205669a a1=80000 a2=1b6 a3=0 items=1 ppid=6466 pid=6467 auid=**** uid=root gid=**** euid=root suid=root fsuid=root egid=**** sgid=**** fsgid=**** tty=pts0 ses=79 comm=sudo exe=/usr/bin/sudo key=(null) 
type=SYSCALL msg=audit(10/03/2011 17:48:30.493:261) : arch=x86_64 syscall=open success=yes exit=4 a0=7f809205669a a1=80000 a2=1b6 a3=0 items=1 ppid=6466 pid=6467 auid=**** uid=root gid=**** euid=root suid=root fsuid=root egid=**** sgid=**** fsgid=**** tty=pts0 ses=79 comm=sudo exe=/usr/bin/sudo key=(null) 
type=SYSCALL msg=audit(10/03/2011 17:48:30.488:256) : arch=x86_64 syscall=open success=yes exit=3 a0=7f809205669a a1=80000 a2=1b6 a3=0 items=1 ppid=6441 pid=6466 auid=**** uid=**** gid=**** euid=root suid=root fsuid=root egid=**** sgid=**** fsgid=**** tty=pts0 ses=79 comm=sudo exe=/usr/bin/sudo key=(null)
...

Is there any way to narrow this down further? I have nothing in my cron files, and no other services that I can think of that would be causing this:

chkconfig | grep on
atd             0:off   1:off   2:off   3:on    4:on    5:on    6:off
cgconfig        0:off   1:off   2:off   3:off   4:off   5:off   6:off
cloud-init      0:off   1:off   2:on    3:on    4:on    5:on    6:off
cloud-init-user-scripts 0:off   1:off   2:on    3:on    4:on    5:on    6:off
conman          0:off   1:off   2:off   3:off   4:off   5:off   6:off
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
iptables        0:off   1:off   2:on    3:on    4:on    5:on    6:off
irqbalance      0:off   1:off   2:off   3:on    4:on    5:on    6:off
lvm2-monitor    0:off   1:on    2:on    3:on    4:on    5:on    6:off
mdmonitor       0:off   1:off   2:on    3:on    4:on    5:on    6:off
netconsole      0:off   1:off   2:off   3:off   4:off   5:off   6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
restorecond     0:off   1:off   2:off   3:off   4:off   5:off   6:off
rsyslog         0:off   1:off   2:on    3:on    4:on    5:on    6:off
sendmail        0:off   1:off   2:on    3:on    4:on    5:on    6:off
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
udev-post       0:off   1:on    2:on    3:on    4:on    5:on    6:off
yum-updatesd    0:off   1:off   2:on    3:on    4:on    5:on    6:off
Bart De Vos
  • 17,911
  • 6
  • 63
  • 82
staticfloat
  • 257
  • 1
  • 3
  • 8
  • What does the sudolog tell you? You may want to enable it and then update the post with the output. – Rilindo Oct 04 '11 at 02:50
  • I don't know the answer, but may I pause to praise the question? A clearer example of the virtues of esr's "How to ask questions the smart way" I have not recently seen; I wish all questions on SF were written this well. – MadHatter Oct 04 '11 at 08:02

2 Answers2

5

As you are using CentOS, you should be able to find out what sudo is doing by looking in /var/log/secure e.g.

sudo tail /var/log/secure

Oct 4 03:45:44 ec2-centos-instance sudo: iain : TTY=pts/0 ; PWD=/home/iain ; USER=root ; COMMAND=/usr/bin/tail /var/log/secure

Edit: Updating with answer from comments

Enabling block dump with:echo 1 > /proc/sys/vm/block_dump and take a look at it with dmesg helped track down which processes were accessing the disk. Much more reliable than iotop. Turns out, sar was running continually, writing out to /var/log/sa/saXX so I disabled that in cron.d, and all is well again

Jason Axelson
  • 334
  • 1
  • 5
  • 17
user9517
  • 115,471
  • 20
  • 215
  • 297
  • I may be barking up the wrong tree here, as that log file doesn't contain anything of interest. I'm thinking it might just be random ssh login attempts, so the source of all the I/Os, (much greater than the ssh login attempts would account for) might instead be a result of a kernel bug. I'll attempt to update my kernel and see how that goes! – staticfloat Oct 04 '11 at 19:03
  • The output I posted in my answer is from and EC2 CentOS instance. Try `grep sudo /var/log/secure` to see what sudo commands are being logged. – user9517 Oct 04 '11 at 19:18
  • Sorry, by "nothing of interest", I meant it only contains commands that I would expect, nothing extremely out of the ordinary. Indeed, the most interesting thing was the bruteforce ssh logins coming in, which I believe may account for the /etc/passwd accesses – staticfloat Oct 04 '11 at 20:03
  • 2
    I'm marking this as the answer, as it helped me the most, but what ended up actually clinching this for me was enabling block dumping (`echo 1 > /proc/sys/vm/block_dump`) and take a look at it with dmesg to see which processes were accessing the disk. Much more reliable than iotop. Turns out, **sar** was running continually, writing out to /var/log/sa/saXX so I disabled that in cron.d, and all is well again. – staticfloat Oct 05 '11 at 14:32
3

It sounds like you have a script or program that is using sudo to try and do stuff as root. You can enable logging in sudo to figure out what they are trying to do and come up with a better solution (perhaps a setuid binary is in order). Here is more info on sudo (like how to enable logging):

http://aplawrence.com/Basics/sudo.html

polynomial
  • 4,016
  • 14
  • 24