I run an instance of Wordpress on my server. My server needs to support at least 1,000 concurrents at a time.
I am using PHP-FPM (PHP 5.4) on Apache with FastCGI as well as Memcache, and APC for my opt-caching. We have two MySQL servers running as slaves.
The server has the following resource capacity:
Ram: 32GB
CPU: 8 Cores
My user that runs the Apache server does so with the following ulimit:
Hard: 4096
Soft: 1024
Intermittently we have downtime, when this downtime hits us we are served with 500 errors from Nginx (which acts as our load balancer on a separate server). When we get these 500 errors (they range from 500 - 504), on htop I can see that we've maxed our RAM usage and, intermittently, our CPU usage (I assume that's Database related?). The process consuming these resources are the PHP-FPM child processes.
I am not the sys admin, I'm merely the developer. So it is starting to get out of my reach.
The php error log seems to report the following:
[Mon Oct 10 12:54:33 2016] [error] [client 155.234.240.16] (104)Connection reset by peer: FastCGI: comm with server "/[MYURL].fcgi" aborted: read failed, referer: [MYURL]
[Mon Oct 10 12:54:33 2016] [error] [client 155.234.240.16] FastCGI: incomplete headers (0 bytes) received from server "/[MYURL].fcgi", referer: [MYURL]
[Mon Oct 10 12:54:34 2016] [error] [client 146.231.88.181] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
With the information I've given thus far, would you be able to assist me in finding a direction to go in, in order to begin diagnosing this issue? I can provide further information if need be.