1

I have a WordPress multi-user site that pegs all of my CPUs at more than 90% usage:

top - 12:02:58 up 55 days,  5:25, 10 users,  load average: 20.51, 15.66, 14.90
Tasks: 294 total,  24 running, 270 sleeping,   0 stopped,   0 zombie
Cpu0  : 87.5%us,  8.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  4.5%si,  0.0%st
Cpu1  : 97.9%us,  1.9%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  : 96.0%us,  3.5%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu3  : 97.6%us,  2.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu4  : 97.1%us,  2.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu5  : 97.9%us,  1.9%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu6  : 97.9%us,  1.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Cpu7  : 96.0%us,  3.5%sy,  0.0%ni,  0.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  14369424k total, 11903548k used,  2465876k free,   402360k buffers
Swap:  4063200k total,  3594784k used,   468416k free,  1484116k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                              
30658 apache    16   0  274m  97m 6304 R 62.1  0.7   0:12.49 php-cgi
30686 apache    16   0  213m  92m 6040 R 52.2  0.7   0:03.27 php-cgi
30685 apache    15   0  211m  87m 5764 S 50.3  0.6   0:04.50 php-cgi
28217 apache    16   0  529m 405m 6748 S 49.0  2.9   3:54.72 php-cgi
30468 apache    16   0  414m 291m 6452 R 48.5  2.1   0:49.78 php-cgi
29604 apache    15   0  258m 135m 6464 S 47.4  1.0   2:16.22 php-cgi
28308 apache    16   0  584m 408m 6724 R 43.9  2.9   3:43.07 php-cgi
28266 apache    16   0  550m 374m 6728 R 43.7  2.7   3:58.38 php-cgi
29573 apache    16   0  584m 407m 6592 R 36.8  2.9   1:59.88 php-cgi
30470 apache    16   0  219m  95m 6452 S 36.5  0.7   0:39.66 php-cgi
29138 apache    15   0  513m 334m 6528 S 33.6  2.4   2:03.14 php-cgi
30472 apache    17   0  441m 318m 6272 R 31.7  2.3   0:50.45 php-cgi
28283 apache    16   0  414m 291m 6580 R 29.3  2.1   3:53.06 php-cgi
29858 apache    16   0  251m 127m 6628 R 24.8  0.9   1:15.53 php-cgi
28253 apache    18   0  550m 374m 6580 R 24.5  2.7   4:08.05 php-cgi
30666 apache    15   0  217m  94m 5996 R 24.5  0.7   0:04.68 php-cgi
28208 apache    20   0  584m 407m 6436 R 24.2  2.9   4:36.36 php-cgi
29085 apache    25   0  358m 182m 6488 R 22.6  1.3   2:19.76 php-cgi
28258 apache    25   0  530m 407m 6512 R 22.4  2.9   3:58.70 php-cgi
29574 apache    16   0  530m 406m 6540 S 21.6  2.9   2:19.26 php-cgi
28947 apache    16   0  524m 401m 6476 R 14.1  2.9   2:32.33 php-cgi
28238 apache    15   0  488m 312m 6852 S 12.3  2.2   4:24.34 php-cgi
30464 apache    15   0  274m 151m 6176 R 11.2  1.1   0:19.67 php-cgi
28293 apache    16   0  269m 146m 6460 R  9.9  1.0   3:57.17 php-cgi
28205 apache    25   0  530m 407m 6496 R  9.6  2.9   4:05.49 php-cgi
30471 apache    19   0  263m 140m 6440 R  6.9  1.0   0:47.42 php-cgi

The output shows that the most CPU an individual process uses is ~60%, but there's been times where I've had as many as 7 process using more than 90% cpu.

The site runs as follows:

  1. nginx works as a reverse proxy, serving every static file that it can and caching pages via the proxy_cache directive.

  2. It delegates to Apache when PHP scripts are required. These are run via mod_cgi using the ExecCGI option

  3. Both Apache and nginx do compression on every human-readable file

  4. To avoid hitting MySQL all the time, we save HTML fragments in memcached, which currently caches between 2 and 4MB, as reported by the stats command in a telnet connection

  5. There's also some counters kept in a Redis database, mostly to count page views for every post.

  6. No WP Super Cache (nginx does the caching), no XCache.

I'm at a loss as to how to determine what exactly every php-cgi process is doing to require such a high CPU demand - the site has been heavily modified by several different software teams before we started giving it maintenance.

The PHP errors log shows mostly these errors:

  1. "Cannot redeclare class FacebookRestClientException"
  2. "Call to undefined function e_()"
  3. Invalid SQL syntax, mostly here: "WHERE post_id = xxxxx AND blog_id = "
  4. "Allowed memory size of 268,435,456 bytes exhausted"
  5. "Call to undefined method Services_JSON::encodeUnsafe()"

None of these actually perform any computation, so they can't be the source of the cpu problem.

I tried tracing system calls and saw lstat, read, write and access, which would generate waiting and not cpu load were they the problem (correct?). Also, there were calls to both poll and select.

Could someone give me pointers as to what to check next?

ptn777
  • 115
  • 1
  • 1
  • 4
  • The chances of a Wordpress sute having gone through several different development teams without at least one of those teams being incompetent is... low (I'd say the chances of *any* of them being competent to write efficient code is low, but that's just years of dealing with poorly written PHP sites talking). – womble Aug 04 '12 at 23:31
  • @womble Agreed - the malformed sql queries are evidence of that. Is there something like xdebug that would let me pinpoint the most cpu-intensive function calls? That's kind of what I was aiming for when doing `strace`. – ptn777 Aug 05 '12 at 04:09
  • That's a question for your developers. – womble Aug 05 '12 at 05:20

3 Answers3

1

Your problem is here:

No WP Super Cache (nginx does the caching), no XCache.

Install APC Zend OPcache and W3 Total Cache and watch your CPU usage drop back down to almost nothing.

APC Zend OPcache alone should give you some breathing room.

Note that W3 Total Cache is not fully multi-site aware, and so it has to be configured on each site individually. It can be set up to use your existing memcached for caching.

You can also get rid of Apache. It's doing absolutely nothing for you.

(Note: APC is deprecated and has proven to be unreliable in practice. I currently recommend using Zend OPcache instead.)

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
  • I'll give W3 Total Cache a look, although this site has been modified so badly that that might not work :( Also, I configure fcgid in the httpd.conf file, isn't Apache the container in which the PHP interpreter runs? – ptn777 Aug 04 '12 at 17:58
  • @ptn777: You said in your question that you're using PHP in CGI mode (and your `top` output bears that out); if you think you're doing something with fcgi, you're mistaken. As Michael says, Apache's doing nothing for you, get rid of it -- just run PHP as an fcgi and point nginx to use that directly. – womble Aug 04 '12 at 23:29
  • @womble I'm confused. If I run `service httpd stop`, all php-cgi processes die and the site stops working - unsurprinsingly. I configured a wrapper script in `/usr/local/bin/` that sets the env var `PHP_FCGI_MAX_REQUESTS` and then `exec /usr/bin/php-cgi`. You say that I could do the same without Apache, are you referring to nginx's `fastcgi_pass`? – ptn777 Aug 05 '12 at 03:02
  • "These are run via mod_cgi using the ExecCGI option" -- yes, I'd say you're *very* confused. `fastcgi_pass` is, indeed, the configuration option to pass requests to a FastCGI backend using nginx. – womble Aug 05 '12 at 05:20
  • Not to mention these days we generally prefer to use `php-fpm` over FastCGI. A sample config comes with nginx, and `php-fpm` has its own startup script so you don't have to write one. – Michael Hampton Aug 05 '12 at 05:24
1

Judging by the other errors you're seeing, it'd be very surprising if there weren't some poorly thought-out code lurking about - a code review would be the best way to approach troubleshooting.

Cannot redeclare class FacebookRestClientException

We know that this class was loaded successfully, so I'd start by finding out which external API's the scripts are calling, whether or not they're failing, and how long they tie things up while running (or failing to run) - a poorly thought-out call (or series of calls) to an external API could be responsible.

danlefree
  • 2,923
  • 1
  • 19
  • 20
0

Memory caching will definitely help, particularly for WordPress's internal object cache. As Michael says, W3 Total Cache isn't fully multi-site aware and it's rather comprehensive/complicated/heavy, so I'd recommend a purer, simpler alternative that's very similar to the setup at wordpress.com: APC Object Cache (which is faster than memcached for single servers) for the object cache and Batcache for full-page memory caching. Pay attention to the installation instructions for each, they're different from standard plugins.

Obviously you'll need to install APC for PHP if you don't have it already and tune it appropriately. For example, stat=0 will speed things up significantly, but if you set this then you'll need to restart your PHP procs when any PHP files change (e.g. on plugin and wordpress core upgrades). Make sure you install the apc.php panel (you may have to get this from the APC source tarball depending on your OS's packages), it's very useful for tuning and debugging. (Lock it down/password protect it, mind.)

Alternatively, as you already have memcached installed, there's the Memcached Redux plugin, which serves the same function as the APC Object Cache. This may be the easier route.

You may not derive a huge amount of benefit from Batcache as you're already using nginx for a file-based proxy_cache, but it certainly won't hurt if you have some memory to spare and may help by bridging the gap between the file cache and directly hitting Apache so it's worth a shot.

Looking at your point 3. I'd highly recommend disabling gzipping in both Apache & PHP i.e. disable mod_deflate and change zlib.output_compression = Off in php.ini. Nginx is your frontend so will do the compression for you anyway, so there's no need to do it twice – nginx will probably do it quicker/more efficiently and it'll save your Apache/PHP processes' CPU.

How many plugins are activated? Are they all essential? Can you disable them one by one and see what difference each makes? I've seen some terribly-coded plugins that have utterly crippled sites, so audit these if you can.

You mention that the site's been heavily modified by several different software teams. Is there any version control so you can see what changes have been made? Are they directly to the core, the theme or the plugins? If they're to the core, can you diff things out to get a better picture or is it too heavily modded? Moving forward, refactoring the alterations away from the core and into the site's theme functions.php & discrete plugins may make your life a lot easier, if you can do it.

Stef Pause
  • 416
  • 3
  • 4