0

This is hard to explain succinctly but my server runs out of physical RAM and into swap with only a few apache/php processes running. To combat this we set up fcgid to not spawn too many php processes, but this means we start serving 503 errors while under only very moderate load.

If you look at the running processes they don't seem to account for the amount of RAM the machine reports as used.

I'm aware of issues such as where linux looks like it has no available RAM because it's all marked as cached or buffered etc. But this doesn't seem to be the case.

Each process seems to have a huge virtual memory size (even though used swap is very low) but I'm not sure if this is related or something to worry about or not.

FYI it's currently running apache with worker MPM and mod_fcgid. But the same problems occured with apache with prefork MPM.

It feels like the machine vastly overestimates how much RAM it is using (or underestimates how much is free).

Hopefully people can understand my handwavey explanation. Please ask if there is more info I can provide.

Here's some stats from the server in question (taken within a minute)

# top

top - 12:29:03 up 7 days, 22:24,  2 users,  load average: 0.45, 0.50, 0.50
Tasks: 103 total,   1 running, 102 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.7%us,  0.4%sy,  0.0%ni, 94.0%id,  0.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1514952k total,  1417244k used,    97708k free,    15236k buffers
Swap:  3681012k total,    90324k used,  3590688k free,    61156k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                           
28863 www-data  20   0  180m  49m 3644 S    0  3.4   1:13.07 php5                                              
28862 www-data  20   0  179m  49m 3668 S    0  3.3   0:48.49 php5                                              
27699 www-data  20   0  180m  48m 3584 S    0  3.3   1:44.68 php5                                              
28865 www-data  20   0  177m  47m 3644 S    0  3.2   1:13.07 php5                                              
27698 www-data  20   0  180m  47m 4132 S    0  3.2   1:40.05 php5                                              
 3203 mysql     20   0  494m  30m 3828 S    0  2.1  75:01.59 mysqld                                            
28777 www-data  20   0  174m 7928 2032 S    0  0.5   0:02.48 apache2                                           
28748 www-data  20   0  174m 6952 1480 S    0  0.5   0:02.58 apache2                                           
28776 www-data  20   0  110m 6744 1480 S    0  0.4   0:02.12 apache2                                           
28959 paul      20   0 36756 5484 2316 S    0  0.4   0:00.37 mysql                                             
22846 root       0 -20 14052 3820 2628 S    0  0.3   0:00.85 atop                                              
28923 root      20   0 70616 3140 2416 S    0  0.2   0:00.04 sshd                                              
24982 www-data  20   0  177m 2820 2816 S    0  0.2   0:08.38 php5                                              
28935 paul      20   0 19428 2164 1584 S    0  0.1   0:00.01 bash                                              
  933 root      20   0 58592 1724 1420 S    0  0.1   1:31.39 vmtoolsd                                          
27451 root      20   0 19568 1608 1220 S    0  0.1   0:00.05 bash                                              
28934 paul      20   0 70616 1580  828 S    0  0.1   0:00.00 sshd                                              
24471 root      20   0  103m 1356  904 S    0  0.1   0:00.09 apache2                                           
29086 root      20   0 19220 1284  964 R    2  0.1   0:00.01 top                                               
28836 postfix   20   0 39272 1272  824 S    0  0.1   0:00.00 pickup                                            
24473 www-data  20   0  102m 1208  692 S    0  0.1   0:00.04 apache2                                           
  717 syslog    20   0  187m 1140  896 S    0  0.1   0:00.79 rsyslogd                                          
    1 root      20   0 23580  972  672 S    0  0.1   0:01.53 init                                              
 1174 postfix   20   0 39432  916  796 S    0  0.1   0:00.21 qmgr                                              
 1165 root      20   0 37208  908  792 S    0  0.1   0:00.40 master                                            
27430 root      20   0 70616  848  844 S    0  0.1   0:00.04 sshd                                              
27442 jack      20   0 19464  812  808 S    0  0.1   0:00.01 bash                                              
  960 root      20   0 21076  740  652 S    0  0.0   0:00.48 cron                                              
27450 root      20   0 37052  740  736 S    0  0.0   0:00.02 su                                                
  685 ntpd      20   0 20312  660  612 S    0  0.0   0:00.08 ntpd                                              
27441 jack      20   0 70616  600  480 S    0  0.0   0:00.04 sshd                                              
  725 root      20   0 49260  592  484 S    0  0.0   0:00.08 sshd                                              
  684 root      20   0 24532  548  524 S    0  0.0   0:00.02 ntpd                                              
  944 root      20   0  6080  524  520 S    0  0.0   0:00.00 getty                                             
  946 root      20   0  6080  524  520 S    0  0.0   0:00.00 getty                                             
  949 root      20   0  6080  524  520 S    0  0.0   0:00.00 getty                                             
  951 root      20   0  6080  524  520 S    0  0.0   0:00.00 getty                                             
  956 root      20   0  6080  524  520 S    0  0.0   0:00.00 getty                                             
 1200 root      20   0  6080  524  520 S    0  0.0   0:00.00 getty                                             
  336 root      20   0 17168  296  292 S    0  0.0   0:00.10 upstart-udev-br                                   
  338 root      16  -4 16972  268  264 S    0  0.0   0:00.14 udevd                                             
  443 root      18  -2 16880  184  180 S    0  0.0   0:00.00 udevd                                             
  442 root      18  -2 16880  168  164 S    0  0.0   0:00.01 udevd                                             
    2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd                                          
    3 root      RT   0     0    0    0 S    0  0.0   0:00.47 migration/0                                       
    4 root      20   0     0    0    0 S    0  0.0   0:00.26 ksoftirqd/0                                       
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0                                        
    6 root      RT   0     0    0    0 S    0  0.0   0:00.68 migration/1                                       
    7 root      20   0     0    0    0 S    0  0.0   0:02.21 ksoftirqd/1 

# free
             total       used       free     shared    buffers     cached
Mem:       1514952    1432132      82820          0      15360      61712
-/+ buffers/cache:    1355060     159892
Swap:      3681012      90324    3590688


# cat /proc/meminfo
MemTotal:        1514952 kB
MemFree:           88712 kB
Buffers:           15400 kB
Cached:            61864 kB
SwapCached:         9480 kB
Active:           247996 kB
Inactive:         117416 kB
Active(anon):     215676 kB
Inactive(anon):    75200 kB
Active(file):      32320 kB
Inactive(file):    42216 kB
Unevictable:        3816 kB
Mlocked:            3816 kB
SwapTotal:       3681012 kB
SwapFree:        3590696 kB
Dirty:               332 kB
Writeback:             0 kB
AnonPages:        287268 kB
Mapped:            14364 kB
Shmem:               188 kB
Slab:              23140 kB
SReclaimable:      10060 kB
SUnreclaim:        13080 kB
KernelStack:        2032 kB
PageTables:         6344 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4438488 kB
Committed_AS:     592516 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      275416 kB
VmallocChunk:   34359457944 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8192 kB
DirectMap2M:     1540096 kB




# ps ax -o rss,size,vsize,cmd | grep -v "0 \["
  RSS    SZ    VSZ CMD
  972   564  23580 /sbin/init
  296   556  17168 upstart-udev-bridge --daemon
  268   504  16972 udevd --daemon
  168   412  16880 udevd --daemon
  184   412  16880 udevd --daemon
  548   304  24532 /usr/sbin/ntpd
  660   296  20312 /usr/sbin/ntpd
 1140 163216 192416 rsyslogd -c4
  592   564  49260 /usr/sbin/sshd
 1724   964  58592 /usr/sbin/vmtoolsd
  524   280   6080 /sbin/getty -8 38400 tty4
  524   280   6080 /sbin/getty -8 38400 tty5
  524   280   6080 /sbin/getty -8 38400 tty2
  524   280   6080 /sbin/getty -8 38400 tty3
  524   280   6080 /sbin/getty -8 38400 tty6
  740   480  21076 cron
  908   340  37208 /usr/lib/postfix/master
  916   448  39432 qmgr -l -t fifo -u
  524   280   6080 /sbin/getty -8 38400 tty1
31532 466492 506696 /usr/sbin/mysqld
 3820  1116  14052 /usr/bin/atop -a -w /var/log/atop.log 600
 1364  2528 106036 /usr/sbin/apache2 -k start
 1208  2528 105264 /usr/sbin/apache2 -k start
 2820 50916 182080 /usr/lib/cgi-bin/php5
  848   756  70616 sshd: jack [priv]
  600   756  70616 sshd: jack@pts/0 
  812   572  19464 -bash
  740   532  37052 su
 1608   676  19568 bash
48216 53332 184496 /usr/lib/cgi-bin/php5
49524 55380 184472 /usr/lib/cgi-bin/php5
 7700 74448 180096 /usr/sbin/apache2 -k start
 7460 74352 180000 /usr/sbin/apache2 -k start
 8340 74232 179880 /usr/sbin/apache2 -k start
 1272   340  39272 pickup -l -t fifo -u -c
49256 53332 182424 /usr/lib/cgi-bin/php5
51060 55376 184468 /usr/lib/cgi-bin/php5
48960 53072 182164 /usr/lib/cgi-bin/php5
 3140   756  70616 sshd: paul [priv]
 1580   756  70616 sshd: paul@pts/1 
 2164   536  19428 -bash
 5484  3328  36756 mysql -uroot -px xxxxxxxx -Dkambos_db
 4812 333244 438892 /usr/sbin/apache2 -k start
 4848 398788 504436 /usr/sbin/apache2 -k start
 2168  4364 110012 /usr/sbin/apache2 -k start
 1012   612   6828 ps ax -o rss,size,vsize,cmd
  928   272   7628 grep --color=auto -v 0 \[

# uname -a
Linux rubik.titaninteractive.com.au 2.6.32-24-server #43-Ubuntu SMP Thu Sep 16 16:05:42 UTC 2010 x86_64 GNU/Linux
beetlefeet
  • 101
  • 2

1 Answers1

1

Have you specified the directive ThreadStackSize in apache?

Otherwise this falls back to the system default, usually 8192 kbytes for 32-bit Linux - it might be different for 64-bit so see "stack size" from the output of ulimit -a. You don't need 8MB per thread - 128k is likely plenty.

For a start, try ThreadStackSize 131072 - see the MPM dox. (or add "ulimit -s 128" to your apache start up script). You might need to play with this figure a bit, depending on your configuration.

I wouldn't be surprised if it was some similar issue leading to the very large instances of php (if it isn't due to APC, as was suggested above). I'm not very familiar with mod_fcgid, so I'll leave that issue to others.

bitslave
  • 111
  • 1