5

We have very strange problem on our Web-project.

We use:

2 Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
12 GB memory
We have about 20 hits per seconds. 4-5 requests per second are heavy – it is a search requests.
We use nginx + php-fpm (5.3.22)
MySQL server installed on another machine.
Most of time we have load average less than 10 and cpu usage about 50%
Sometimes we get cpu usage about 95% and after that load average grows to 50 and more!!!
You can see Load Average and CPU Usage here (my reputation low to send images here)
Load Average
CPU Usage

We have to reload php-fpm ( /etc/init.d/php-fpm reload) to normalize situation.
This can happens 4-5 times per day.
I tried to use strace to exam this situation.
Sorry for long logs! This output of command strace -cp PID
PID – is the random php-fpm process id (We start 100 php-fpm processes).
This two results in the moment with high cpu usage.

Process 17272 attached - interrupt to quit
Process 17272 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.56    0.008817         267        33           munmap
 13.38    0.001799         900         2           clone
  9.66    0.001299           2       589           read
  7.43    0.000999         125         8           mremap
  2.84    0.000382           1       559        96 access
  0.59    0.000080          40         2           waitpid
  0.29    0.000039           0       627           gettimeofday
  0.16    0.000022           0       346           write
  0.04    0.000006           0        56           getcwd
  0.04    0.000005           0       348           poll
  0.00    0.000000           0        55           open
  0.00    0.000000           0        69           close
  0.00    0.000000           0        17           chdir
  0.00    0.000000           0       189           time
  0.00    0.000000           0        28           lseek
  0.00    0.000000           0         2           pipe
  0.00    0.000000           0        17           times
  0.00    0.000000           0         8           brk
  0.00    0.000000           0         8           getrusage
  0.00    0.000000           0        18           setitimer
  0.00    0.000000           0         8           flock
  0.00    0.000000           0         1           nanosleep
  0.00    0.000000           0        11           rt_sigaction
  0.00    0.000000           0        13           rt_sigprocmask
  0.00    0.000000           0         6           pread64
  0.00    0.000000           0         7           pwrite64
  0.00    0.000000           0        33           mmap2
  0.00    0.000000           0        18         4 stat64
  0.00    0.000000           0        34           lstat64
  0.00    0.000000           0        92           fstat64
  0.00    0.000000           0        63           fcntl64
  0.00    0.000000           0        53           clock_gettime
  0.00    0.000000           0         1           socket
  0.00    0.000000           0         1         1 connect
  0.00    0.000000           0         9           accept
  0.00    0.000000           0         1           send
  0.00    0.000000           0        21           recv
  0.00    0.000000           0         9         1 shutdown
  0.00    0.000000           0         1           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.013448                  3363       102 total



[root@hp-php ~]# strace -cp 30767
Process 30767 attached - interrupt to quit
Process 30767 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.88    0.016926         220        77           munmap
 29.06    0.009301           2      4343           read
  8.73    0.002794         466         6           clone
  3.59    0.001149           0      5598           time
  3.18    0.001017           0      3745           write
  1.12    0.000358           0      7316           gettimeofday
  0.64    0.000205           1       164           fcntl64
  0.39    0.000124          21         6           waitpid
  0.22    0.000070           0      1496       326 access
  0.13    0.000041           0      3769           poll
  0.03    0.000009           0       151           close
  0.02    0.000008           0       114           clock_gettime
  0.02    0.000007           0       110           getcwd
  0.00    0.000000           0       112           open
  0.00    0.000000           0        38           chdir
  0.00    0.000000           0        47           lseek
  0.00    0.000000           0         6           pipe
  0.00    0.000000           0        38           times
  0.00    0.000000           0       135           brk
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0        14           getrusage
  0.00    0.000000           0        38           setitimer
  0.00    0.000000           0        19           flock
  0.00    0.000000           0        40           mlock
  0.00    0.000000           0        40           munlock
  0.00    0.000000           0         6           nanosleep
  0.00    0.000000           0        27           rt_sigaction
  0.00    0.000000           0        31           rt_sigprocmask
  0.00    0.000000           0        13           pread64
  0.00    0.000000           0        18           pwrite64
  0.00    0.000000           0        78           mmap2
  0.00    0.000000           0       111        10 stat64
  0.00    0.000000           0        49           lstat64
  0.00    0.000000           0       182           fstat64
  0.00    0.000000           0         8           socket
  0.00    0.000000           0         8         5 connect
  0.00    0.000000           0        19           accept
  0.00    0.000000           0         7           send
  0.00    0.000000           0        66           recv
  0.00    0.000000           0         3           recvfrom
  0.00    0.000000           0        20         1 shutdown
  0.00    0.000000           0         5           setsockopt
  0.00    0.000000           0         4           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.032009                 28080       342 total

Yes, out scripts reads much information. This is normal.
But why munmap works very long??!! And when we have problem munmap ALWAYS in top!
For comparison this is result of trace random php-fpm process in regular situation:

Process 28606 attached - interrupt to quit
Process 28606 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.72    0.001816           1      2601           read
 32.88    0.001306         435         3           clone
  9.19    0.000365           0      2175           write
  6.95    0.000276           0      7521           time
  2.24    0.000089           0      4158           gettimeofday
  2.01    0.000080           1       114           brk
  0.28    0.000011           0      2166           poll
  0.20    0.000008           0       833       155 access
  0.20    0.000008           0        53           recv
  0.18    0.000007           2         3           waitpid
  0.15    0.000006           0        18           munlock
  0.00    0.000000           0        69           open
  0.00    0.000000           0        96           close
  0.00    0.000000           0        29           chdir
  0.00    0.000000           0        36           lseek
  0.00    0.000000           0         3           pipe
  0.00    0.000000           0        29           times
  0.00    0.000000           0        10           getrusage
  0.00    0.000000           0         5           munmap
  0.00    0.000000           0         1           ftruncate
  0.00    0.000000           0        29           setitimer
  0.00    0.000000           0         1           sigreturn
  0.00    0.000000           0        11           flock
  0.00    0.000000           0        18           mlock
  0.00    0.000000           0         5           nanosleep
  0.00    0.000000           0        19           rt_sigaction
  0.00    0.000000           0        24           rt_sigprocmask
  0.00    0.000000           0         6           pread64
  0.00    0.000000           0        12           pwrite64
  0.00    0.000000           0        69           getcwd
  0.00    0.000000           0         5           mmap2
  0.00    0.000000           0        35         7 stat64
  0.00    0.000000           0        41           lstat64
  0.00    0.000000           0        96           fstat64
  0.00    0.000000           0       108           fcntl64
  0.00    0.000000           0        87           clock_gettime
  0.00    0.000000           0         5           socket
  0.00    0.000000           0         4         4 connect
  0.00    0.000000           0        16         2 accept
  0.00    0.000000           0         8           send
  0.00    0.000000           0        15           shutdown
  0.00    0.000000           0         4           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.003972                 20541       168 total

Process 29168 attached - interrupt to quit
Process 29168 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.81    0.002366           1      1717           read
 26.41    0.001140           1      1696           poll
  8.29    0.000358           0      1662           write
  7.37    0.000318           2       131       121 stat64
  1.53    0.000066           0      3249           gettimeofday
  1.18    0.000051           0       746       525 access
  0.23    0.000010           0        27           fcntl64
  0.19    0.000008           0        62           brk
  0.00    0.000000           0         1           restart_syscall
  0.00    0.000000           0         7           open
  0.00    0.000000           0        16           close
  0.00    0.000000           0         3           chdir
  0.00    0.000000           0      1039           time
  0.00    0.000000           0         1           lseek
  0.00    0.000000           0         3           times
  0.00    0.000000           0         3           ioctl
  0.00    0.000000           0         1           getrusage
  0.00    0.000000           0         4           munmap
  0.00    0.000000           0         3           setitimer
  0.00    0.000000           0         1           sigreturn
  0.00    0.000000           0         1           flock
  0.00    0.000000           0         1           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           pwrite64
  0.00    0.000000           0         3           getcwd
  0.00    0.000000           0         4           mmap2
  0.00    0.000000           0         7           fstat64
  0.00    0.000000           0         9           clock_gettime
  0.00    0.000000           0         6           socket
  0.00    0.000000           0         5         1 connect
  0.00    0.000000           0         3         2 accept
  0.00    0.000000           0         5           send
  0.00    0.000000           0        64           recv
  0.00    0.000000           0         3           recvfrom
  0.00    0.000000           0         2           shutdown
  0.00    0.000000           0         1           getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.004317                 10489       649 total

And you can see that munmap not in top.
Now we don’t have ideas how to solve this problem :(
We examined next potential problems and answers are "NO":

  • additioan user activity
  • long scripts execution (several seconds)
  • using swap

Can you help us?

dhythhsba
  • 972
  • 2
  • 11
  • 21
  • 1
    How about memory usage? Try to monitor it when load is high again. – Roman Newaza Mar 21 '13 at 08:53
  • What type of PM Manager are you using? IE ondemand, static, dynamic? If it is not ondemand, can you share your settings configuration for that PM? – Diemuzi Mar 22 '13 at 13:07

0 Answers0