Tuning high-raffic nginx and wordpress server

Question

I have been conducting load-tests (via blitz.io) as I attempt to tune server performance on a pool of servers running php 5.5, wordpress 3.9.1, and nginx 1.6.2.

My confusion arises when I overload a single server with too much traffic. I fully realize that there are finite resources on a server and at some level it will have to begin rejecting connections and/or returning 502 (or similar) responses. What's confusing me though, is why my server appears to be returning 502s so early within a load test.

I have attempted to tune nginx to accept several connections:

nginx.conf

worker_processes auto;
worker_rlimit_nofile 100000;

events {
    worker_connections 1024;
    use epoll;
    multi_accept on;
}

site.conf

location ~ \.php$ {
      try_files $uri =404;
      include /etc/nginx/fastcgi_params;
      fastcgi_pass unix:/var/run/php5-fpm.sock;
      fastcgi_index index.php;
      fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
      fastcgi_read_timeout 60s;
      fastcgi_send_timeout 60s;
      fastcgi_next_upstream_timeout 0;
      fastcgi_connect_timeout 60s;
   }

php www.conf

pm = static
pm.max_children = 8

I expect the load test to saturate the PHP workers rather quickly. But I also expect nginx to continue accepting connections and after the fast_cgi timeouts are hit, begin returning some sort of HTTP error code.

What I'm actually seeing is nginx returning 502s almost immediately after the test is launched.

nginx error.log

2014/11/01 20:35:24 [error] 16688#0: *25837 connect() to unix:/var/run/php5-fpm.sock failed 
(11: Resource temporarily unavailable) while connecting to upstream, client: OBFUSCATED, 
server: OBFUSCATED, request: "GET /?bust=1 HTTP/1.1", upstream: 
"fastcgi://unix:/var/run/php5-fpm.sock:", host: "OBFUSCATED"

What am I missing? Why aren't the pending requests being queued up, and then either completing or timing out later in the process?

Xavier Lucas · Answer 1 · 2014-11-01T22:05:15.743

0

This means the php part crashed and isn't listening anymore on the unix socket.

So nginx won't queue anything as it just can't contact the proxied server to send request to, and at this point, you can easily imagine that requests get processed very fast on nginx's side.

If your php server didn't crash, requests would indeed standby regarding fastcgi_connect_timeout and fastcgi_read_timeout values, waiting for some event to show up. If these timeouts were reached you should see 504 error codes.

Your worker_connections seems a bit low by the way compared to rlimit.

It may also be time to start using an upstream block to decide how nginx must behave when target servers seems down, using health checks. With this you can manage how long is the delay that, once reached, will mark a server as down. Once considered down, requests won't reach him until the healthcheck condition to mark him up again passes.

edited Nov 01 '14 at 22:05

answered Nov 01 '14 at 21:50

Xavier Lucas

13,095
2
44
50

How can I look into the "php server" crash you're suggesting? Using htop to monitor what's happening on the server, the php processes seem to all be chugging along just fine. I'm getting a reasonable number of successful responses throughout the test. – reustmd Nov 01 '14 at 22:19
@manu08 First, monitor unix socket with netstat -x and monitor nofile/nbprocess per user, it's likely than you reached a user limit of thid kind. If fine, use strace. – Xavier Lucas Nov 01 '14 at 22:51
thanks for the information. I'll keep reading what I can find, but this is a bit new for me, so I'm not 100% sure how to recognize the culprit when I see it. Any resources you can link to that'll help me wrap my head around this? Also, shouldn't this show up in some PHP or linux logs, if the issue is a user limit? – reustmd Nov 02 '14 at 02:28

score 0 · Answer 2 · answered Nov 02 '14 at 11:02

Your problem is most likely with your PHP-FPM configuration, because you are using the static process manager with only 8 child processes. Just about any load testing will use up those 8 child processes instantly and beg for more -- when there isn't an idle child process to process the PHP code, you'll get the 502 errors you are seeing.

You should switch to the dynamic, or even better (in my opinion), ondemand.

Also, set your max_children fairly high, depending on what kinds of load tests you are running. Without knowing the details of the tests you are running I can't suggest any values for max_children. In my case, where I have several sites which as a whole get ~2,500 unique visitors and ~15,000 pageviews daily, my max_children is set to 64 and it never gets even close to that number. I set it higher than I need because load testing has indicated that my server can handle quite a bit more traffic than it is currently getting.

Once you get the load tests running well you'll have a better idea of how to tune your PHP-FPM configuration. I'd say set max_children to 64 like I do; just check the PHP-FPM log to see if you are banging up against that limit and adjust upwards as needed.

Thanks Justin. I will play around with higher max_children limits, but I was under the impression that nginx should wait for those busy children to free up based on the cgi timeouts. I didn't realize I would need a php child process per nginx connection. — reustmd, Nov 02 '14 at 23:29
If it were timing out, you would be receiving a 504 error, not a 502. In your case, there are no processes available to even make a connection to, because there are only 8 child processes (and no way to spawn more since you are using the static process manager). Anything but the lightest of load testing will likely throw 502's almost immediately. Check your PHP-FPM log -- if you see warnings that pool www has reached pm.max_children setting, that is the source of your 502's. — Justin L. Franks, Nov 03 '14 at 20:12

Tuning high-raffic nginx and wordpress server

2 Answers2