I have a virtual server hosting multiple small sites in a LAMP environment. The system was originally a Debian 4, later dist-upgraded to Debian 5. It has 1GB dedicated and 1GB shared RAM and 20GB storage space.
Recently the system has started to display some alarming tendencies to randomly slow down when writing files to the file system (/dev/vzfs
). When these bursts of slowness happen, the loads start going up but the processor stays mainly idle - even the IO wait percentage stays mostly at zero. Here's an overview of the most recent time I encountered the problem, while saving a 1kB Apache configuration file that took about 20 seconds to save:
top - 18:05:38 up 274 days, 11:50, 4 users, load average: 0.71, 0.25, 0.08
Tasks: 54 total, 1 running, 53 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2097152k total, 471044k used, 1626108k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached
I'm mainly a coder and don't have much system administration experience, so I'm not sure where I should start looking. Any pointers are much appreciated.
Update: The standard system logs didn't contain anything useful information regarding this case, so I contacted the service provider asking if the host system is overloaded. They responded that the system load is normal, but my container seems to be occasionally exceeding it's resource allocations. Here are the lines from /proc/user_beancounters
that have a failcnt > 0:
uid resource held maxheld barrier limit failcnt
shmpages 9744 19470 19567 19567 1
tcpsndbuf 306232 2453448 2449232 3598712 42347113
tcprcvbuf 299568 2459056 2449232 3598712 1640
othersockbuf 101640 843592 844366 1481926 140
numfile 3100 6000 6000 6000 11
The one I'm exceeding the most is tcpsndbuf
by a clear margin. However, I'm guessing this shouldn't affect the file system performance. The numfile
has been exceeded 11 times (is this number an all-time total or since the last reboot?) and sounds like something that could be the cause of the problem. Half of the open files seem to belong to apache2
, who has every log file and .so open for all of it's processes. Maybe switching to Lighttpd or Nginx could help? I'll check the beancounters next time the system slows down and see if that gives any clues.