Wordpress Environment and Exceedingly high usage (PHP-FPM)

Question

I run an instance of Wordpress on my server. My server needs to support at least 1,000 concurrents at a time.

I am using PHP-FPM (PHP 5.4) on Apache with FastCGI as well as Memcache, and APC for my opt-caching. We have two MySQL servers running as slaves.

The server has the following resource capacity:

Ram: 32GB
CPU: 8 Cores

My user that runs the Apache server does so with the following ulimit:

Hard: 4096
Soft: 1024

Intermittently we have downtime, when this downtime hits us we are served with 500 errors from Nginx (which acts as our load balancer on a separate server). When we get these 500 errors (they range from 500 - 504), on htop I can see that we've maxed our RAM usage and, intermittently, our CPU usage (I assume that's Database related?). The process consuming these resources are the PHP-FPM child processes.

I am not the sys admin, I'm merely the developer. So it is starting to get out of my reach.

The php error log seems to report the following:

[Mon Oct 10 12:54:33 2016] [error] [client 155.234.240.16] (104)Connection reset by peer: FastCGI: comm with server "/[MYURL].fcgi" aborted: read failed, referer: [MYURL]
[Mon Oct 10 12:54:33 2016] [error] [client 155.234.240.16] FastCGI: incomplete headers (0 bytes) received from server "/[MYURL].fcgi", referer: [MYURL]
[Mon Oct 10 12:54:34 2016] [error] [client 146.231.88.181] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.

With the information I've given thus far, would you be able to assist me in finding a direction to go in, in order to begin diagnosing this issue? I can provide further information if need be.

Despite the current issue you are having a good profiling/monitoring tool would help a lot. I am currently using https://newrelic.com/ but few friends of mine recommend https://blackfire.io/ — jakub wrona, Oct 10 '16 at 11:31
Thanks for your response. We are running New Relic, I've requested access. Is there anything from there I could post on the question that may help further diagnosis? — Matthew, Oct 10 '16 at 11:34
It will help you track peaks, bottlenecks and potential issues. I think the problem is too wide and too "relative" to just post a particular info from New Relic, but look for anything suspicious. You know your system so if a given function eats much too much, have a closer look. Same with cycles/milliseconds. — jakub wrona, Oct 10 '16 at 11:43
My sys admins added an extra 32GB of RAM. It's not a solution that I am happy with as we should not be sitting on 32GB of usage with 600 concurrents. The site has stopped falling over but my htop shows my RAM sitting at (an almost static) 32GB of usage. — Matthew, Oct 10 '16 at 13:54
Often it's an issue with the htaccess file. Do you have access to the htaccess file? And if so, can you post it? — kayleighsdaddy, Oct 12 '16 at 13:16

score 5 · Accepted Answer · answered Oct 12 '16 at 15:24

These errors are common in two situations for WordPress -- XMLRPC attack or wrapper config not allowing needed FastCGI spawning. Problem is too wider with combination of Apache2 with Nginx in front. I am writing as steps.

FastCGI effectively prevents site from being attacked by a Denial of Service or crashing due to memory leaks. For Nginx PHP-FPM, such situation always demands to check for XMLRPC attack (or similar brute force) and block it. If one IP requests 600 times within a day, obviously it is an attack. So the above is the first step, you are checking XMLRPC attack, blocking the infamous XMLRPC file of WP plus getting number of times few IP repeatedly requested. Here is written here to how to check fake PHP5-FPM attack - wordpress-xml-rpc-attack-fake-php5-fpm-error logs for Nginx (you are Apache2 with Nginx in front, you can use the commands I written in that guide to extract the errors or IPs).

As second step, incomplete header with Apache2 + PHP-FPM itself demands to see your fcgi wrapper (/dev/shm/blackmou-php.fcgi) or .htaccess for FastCGI spawning. This an example of wrapper config :

PHP_FCGI_CHILDREN=0 
export PHP_FCGI_CHILDREN
PHP_FCGI_MAX_REQUESTS=10000
export PHP_FCGI_MAX_REQUESTS

Also we need to increase memory_limit from php.ini. For similar situation on Nginx, we adjust fastcgi_max_temp_file_size, fastcgi_buffers -

fastcgi_buffers 256 16k;
fastcgi_max_temp_file_size 0;

If above are not the issue, as third step, enable WP_DEBUG in your wp-config.php file. You might see a better error message for plugin issue, but no warranty.

If it is not the issue, as forth step, deactivate all plugins and use default theme for few minutes. If nothing appears, theme or plugin has problem.

Also, as fifth step, there is xdebug profiler for checking.

Notes :

If you fear database is faulty, use WordPress function to repair database. However it is unlikely.
You should have properly configured iptables, fail2ban, limit wp-login.php access etc.

Thanks, you answered the question. I will begin diagnosing using your steps and hopefully I can get somewhere with that. — Matthew, Oct 13 '16 at 09:30

Wordpress Environment and Exceedingly high usage (PHP-FPM)

1 Answers1