This is hard to explain succinctly but my server runs out of physical RAM and into swap with only a few apache/php processes running. To combat this we set up fcgid to not spawn too many php processes, but this means we start serving 503 errors while under only very moderate load.
If you look at the running processes they don't seem to account for the amount of RAM the machine reports as used.
I'm aware of issues such as where linux looks like it has no available RAM because it's all marked as cached or buffered etc. But this doesn't seem to be the case.
Each process seems to have a huge virtual memory size (even though used swap is very low) but I'm not sure if this is related or something to worry about or not.
FYI it's currently running apache with worker MPM and mod_fcgid. But the same problems occured with apache with prefork MPM.
It feels like the machine vastly overestimates how much RAM it is using (or underestimates how much is free).
Hopefully people can understand my handwavey explanation. Please ask if there is more info I can provide.
Here's some stats from the server in question (taken within a minute)
# top
top - 12:29:03 up 7 days, 22:24, 2 users, load average: 0.45, 0.50, 0.50
Tasks: 103 total, 1 running, 102 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.7%us, 0.4%sy, 0.0%ni, 94.0%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1514952k total, 1417244k used, 97708k free, 15236k buffers
Swap: 3681012k total, 90324k used, 3590688k free, 61156k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28863 www-data 20 0 180m 49m 3644 S 0 3.4 1:13.07 php5
28862 www-data 20 0 179m 49m 3668 S 0 3.3 0:48.49 php5
27699 www-data 20 0 180m 48m 3584 S 0 3.3 1:44.68 php5
28865 www-data 20 0 177m 47m 3644 S 0 3.2 1:13.07 php5
27698 www-data 20 0 180m 47m 4132 S 0 3.2 1:40.05 php5
3203 mysql 20 0 494m 30m 3828 S 0 2.1 75:01.59 mysqld
28777 www-data 20 0 174m 7928 2032 S 0 0.5 0:02.48 apache2
28748 www-data 20 0 174m 6952 1480 S 0 0.5 0:02.58 apache2
28776 www-data 20 0 110m 6744 1480 S 0 0.4 0:02.12 apache2
28959 paul 20 0 36756 5484 2316 S 0 0.4 0:00.37 mysql
22846 root 0 -20 14052 3820 2628 S 0 0.3 0:00.85 atop
28923 root 20 0 70616 3140 2416 S 0 0.2 0:00.04 sshd
24982 www-data 20 0 177m 2820 2816 S 0 0.2 0:08.38 php5
28935 paul 20 0 19428 2164 1584 S 0 0.1 0:00.01 bash
933 root 20 0 58592 1724 1420 S 0 0.1 1:31.39 vmtoolsd
27451 root 20 0 19568 1608 1220 S 0 0.1 0:00.05 bash
28934 paul 20 0 70616 1580 828 S 0 0.1 0:00.00 sshd
24471 root 20 0 103m 1356 904 S 0 0.1 0:00.09 apache2
29086 root 20 0 19220 1284 964 R 2 0.1 0:00.01 top
28836 postfix 20 0 39272 1272 824 S 0 0.1 0:00.00 pickup
24473 www-data 20 0 102m 1208 692 S 0 0.1 0:00.04 apache2
717 syslog 20 0 187m 1140 896 S 0 0.1 0:00.79 rsyslogd
1 root 20 0 23580 972 672 S 0 0.1 0:01.53 init
1174 postfix 20 0 39432 916 796 S 0 0.1 0:00.21 qmgr
1165 root 20 0 37208 908 792 S 0 0.1 0:00.40 master
27430 root 20 0 70616 848 844 S 0 0.1 0:00.04 sshd
27442 jack 20 0 19464 812 808 S 0 0.1 0:00.01 bash
960 root 20 0 21076 740 652 S 0 0.0 0:00.48 cron
27450 root 20 0 37052 740 736 S 0 0.0 0:00.02 su
685 ntpd 20 0 20312 660 612 S 0 0.0 0:00.08 ntpd
27441 jack 20 0 70616 600 480 S 0 0.0 0:00.04 sshd
725 root 20 0 49260 592 484 S 0 0.0 0:00.08 sshd
684 root 20 0 24532 548 524 S 0 0.0 0:00.02 ntpd
944 root 20 0 6080 524 520 S 0 0.0 0:00.00 getty
946 root 20 0 6080 524 520 S 0 0.0 0:00.00 getty
949 root 20 0 6080 524 520 S 0 0.0 0:00.00 getty
951 root 20 0 6080 524 520 S 0 0.0 0:00.00 getty
956 root 20 0 6080 524 520 S 0 0.0 0:00.00 getty
1200 root 20 0 6080 524 520 S 0 0.0 0:00.00 getty
336 root 20 0 17168 296 292 S 0 0.0 0:00.10 upstart-udev-br
338 root 16 -4 16972 268 264 S 0 0.0 0:00.14 udevd
443 root 18 -2 16880 184 180 S 0 0.0 0:00.00 udevd
442 root 18 -2 16880 168 164 S 0 0.0 0:00.01 udevd
2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd
3 root RT 0 0 0 0 S 0 0.0 0:00.47 migration/0
4 root 20 0 0 0 0 S 0 0.0 0:00.26 ksoftirqd/0
5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
6 root RT 0 0 0 0 S 0 0.0 0:00.68 migration/1
7 root 20 0 0 0 0 S 0 0.0 0:02.21 ksoftirqd/1
# free
total used free shared buffers cached
Mem: 1514952 1432132 82820 0 15360 61712
-/+ buffers/cache: 1355060 159892
Swap: 3681012 90324 3590688
# cat /proc/meminfo
MemTotal: 1514952 kB
MemFree: 88712 kB
Buffers: 15400 kB
Cached: 61864 kB
SwapCached: 9480 kB
Active: 247996 kB
Inactive: 117416 kB
Active(anon): 215676 kB
Inactive(anon): 75200 kB
Active(file): 32320 kB
Inactive(file): 42216 kB
Unevictable: 3816 kB
Mlocked: 3816 kB
SwapTotal: 3681012 kB
SwapFree: 3590696 kB
Dirty: 332 kB
Writeback: 0 kB
AnonPages: 287268 kB
Mapped: 14364 kB
Shmem: 188 kB
Slab: 23140 kB
SReclaimable: 10060 kB
SUnreclaim: 13080 kB
KernelStack: 2032 kB
PageTables: 6344 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4438488 kB
Committed_AS: 592516 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 275416 kB
VmallocChunk: 34359457944 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 8192 kB
DirectMap2M: 1540096 kB
# ps ax -o rss,size,vsize,cmd | grep -v "0 \["
RSS SZ VSZ CMD
972 564 23580 /sbin/init
296 556 17168 upstart-udev-bridge --daemon
268 504 16972 udevd --daemon
168 412 16880 udevd --daemon
184 412 16880 udevd --daemon
548 304 24532 /usr/sbin/ntpd
660 296 20312 /usr/sbin/ntpd
1140 163216 192416 rsyslogd -c4
592 564 49260 /usr/sbin/sshd
1724 964 58592 /usr/sbin/vmtoolsd
524 280 6080 /sbin/getty -8 38400 tty4
524 280 6080 /sbin/getty -8 38400 tty5
524 280 6080 /sbin/getty -8 38400 tty2
524 280 6080 /sbin/getty -8 38400 tty3
524 280 6080 /sbin/getty -8 38400 tty6
740 480 21076 cron
908 340 37208 /usr/lib/postfix/master
916 448 39432 qmgr -l -t fifo -u
524 280 6080 /sbin/getty -8 38400 tty1
31532 466492 506696 /usr/sbin/mysqld
3820 1116 14052 /usr/bin/atop -a -w /var/log/atop.log 600
1364 2528 106036 /usr/sbin/apache2 -k start
1208 2528 105264 /usr/sbin/apache2 -k start
2820 50916 182080 /usr/lib/cgi-bin/php5
848 756 70616 sshd: jack [priv]
600 756 70616 sshd: jack@pts/0
812 572 19464 -bash
740 532 37052 su
1608 676 19568 bash
48216 53332 184496 /usr/lib/cgi-bin/php5
49524 55380 184472 /usr/lib/cgi-bin/php5
7700 74448 180096 /usr/sbin/apache2 -k start
7460 74352 180000 /usr/sbin/apache2 -k start
8340 74232 179880 /usr/sbin/apache2 -k start
1272 340 39272 pickup -l -t fifo -u -c
49256 53332 182424 /usr/lib/cgi-bin/php5
51060 55376 184468 /usr/lib/cgi-bin/php5
48960 53072 182164 /usr/lib/cgi-bin/php5
3140 756 70616 sshd: paul [priv]
1580 756 70616 sshd: paul@pts/1
2164 536 19428 -bash
5484 3328 36756 mysql -uroot -px xxxxxxxx -Dkambos_db
4812 333244 438892 /usr/sbin/apache2 -k start
4848 398788 504436 /usr/sbin/apache2 -k start
2168 4364 110012 /usr/sbin/apache2 -k start
1012 612 6828 ps ax -o rss,size,vsize,cmd
928 272 7628 grep --color=auto -v 0 \[
# uname -a
Linux rubik.titaninteractive.com.au 2.6.32-24-server #43-Ubuntu SMP Thu Sep 16 16:05:42 UTC 2010 x86_64 GNU/Linux