5

I've got a server running on a Linode with Ubuntu 10.04 LTS, Nginx 0.7.65, MySQL 5.1.41 and PHP 5.3.2 with PHP-FPM.

There is a WordPress blog on it, updated to WordPress 3.2.1 recently.

I have made no changes to the server (except updating WordPress) and while it was running fine, a couple of days ago I started having downtimes.

I tried to solve the problem, and checking the error_log I saw many timeouts and messages that seemed to be related to timeouts. The server is currently logging this kind of errors:

2011/07/14 10:37:35 [warn] 2539#0: *104 an upstream response is buffered to a temporary file /var/lib/nginx/fastcgi/2/00/0000000002 while reading upstream, client: 217.12.16.51, server: www.example.com, request: "GET /page/2/ HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "www.example.com", referrer: "http://www.example.com/"

2011/07/14 10:40:24 [error] 2539#0: *231 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 46.24.245.181, server: www.example.com, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.example.com", referrer: "http://www.google.es/search?sourceid=chrome&ie=UTF-8&q=example"

and even saw this previous serverfault discussion with a possible solution: to edit /etc/php/etc/php-fpm.conf and change

request_terminate_timeout=30s

instead of

;request_terminate_timeout= 0

The server worked for some hours, and then broke again. I edited the file again to leave it as it was, and restarted again php-fpm (service php-fpm restart) but no luck: the server worked for a few minutes and back to the problem over and over. The strange thing is, although the services are running, htop shows there is no CPU load (see image) and I really don't know how to solve the problem.

enter image description here

The config files are on pastebin

The php-fpm.conf file is here

The /etc/nginx/nginx.conf is here

The /etc/nginx/sites-available/www.example.com is here

javipas
  • 1,332
  • 3
  • 23
  • 38

2 Answers2

-1

Have you tried instead of "upstream" -ing in nginx.conf doing something like:

# Pass PHP scripts to PHP-FPM
location ~* \.php$ {
   try_files       $uri /index.php;
   fastcgi_index   index.php;
   fastcgi_pass    127.0.0.1:9000;
   include         fastcgi_params;
   fastcgi_param   SCRIPT_FILENAME    $document_root$fastcgi_script_name;
   fastcgi_param   SCRIPT_NAME        $fastcgi_script_name;
}

Take a look here http://www.if-not-true-then-false.com/2011/nginx-and-php-fpm-configuration-and-optimizing-tips-and-tricks/

adrian7
  • 109
  • 8
  • Thanks for the tip adrian7, at that time the problem was solved thanks to the hosting provider tips. I will remember your point -don't know if it would work, everything is fine now-, maybe it could be useful on future situations similar to this. Regards! – javipas May 14 '12 at 09:24
  • 1
    What was the "hosting provider tips" ? – Daniel T. Magnusson Jun 18 '12 at 09:36
  • I'm affraid I don't remember :( In fact, that server no longer exists. Sometime ago I moved everything from there to another VPS. Sorry! – javipas Jul 11 '12 at 15:33
  • 3
    @adrian7 Your suggestion is absolutely useless, as there's no difference between `fastcgi_pass 127.0.0.1:9000;` and `upstream backend { server 127.0.0.1:9000; } fastcgi_pass backend;`. – VBart Jul 23 '12 at 02:04
-1

and restarted again php-fpm [...] the server worked for a few minutes and back to the problem over and over

The problem is php-fpm config

But it's not the timeout. Increasing the timeout just gives php more time to process a single request - which may mask the symptoms but is not the right solution.

The php-fpm log should make the reason why the server is struggling apparent; in my experience (obviously in the absence of information this is a guess) the php-fpm log file will contain entries like this:

#/var/log/php5-fpm.log
[19-Oct-2014 06:25:10] NOTICE: error log file re-opened
[19-Oct-2014 17:46:56] WARNING: [pool www] seems busy (you may need to increase
pm.start_servers, or pm.min/max_spare_servers), spawning 1 children, there are 
1 idle, and 5 total children
...

If there are only a few log entries like the above, that's not much of a problem. If there are many and only minutes or seconds apart - then php-fpm has insufficient resources for the load it's being asked to cope with.

This is not uncommon because a standard dist php-fpm config file will contain something similar to this:

# /etc/php5/fpm/pool.d/www.conf
pm = dynamic
pm.max_children = 5
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3

Which means php-fpm will only handle a maximum of 5 requests in parallel.

Especially with something like wordpress, which for a single html page hands a large number of subsequent requests (images, css, js files etc.) also to php - it is easy for a large and ever-increasing queue of requests to form such that for any given request it must first wait for the in-process and already-waiting requests to be processed first. This leads to delays (it will show up as waiting time in any browser profiling tool) and frequently leads to a large number of time outs.

Also note that a large number of 404s (requests for anything that don't exist) is an easy way to exaggerate the limitations of any server - check for and fix any 404s that the site is generating.

How to fix it

If the problem is that php-fpm has too few server-processes running - just increase them. The numbers to use depend on the hardware of the server it is deployed upon; here's a suggestion:

# /etc/php5/fpm/pool.d/www.conf
pm = dynamic
pm.max_children = 20
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 15

This would permit serving 20 requests in parallel - and should alleviate any problems without causing the server to struggle.

If in doubt though, there's a simple rule to follow when changing php-fpm config:

  • Increase until error messages disappear (and performance is acceptable)
  • Decrease if the server runs out of memory or server load is unacceptable :)
AD7six
  • 2,920
  • 2
  • 21
  • 23
  • I'm affraid the system is no longer running, so I can't really tell if that change could make things work as expected. Thank you for your help though. – javipas Nov 07 '14 at 16:47
  • Hmm, not sure. The `php5-fpm.log` message you show above does not indicate server overload, just a fifth parallel request coming in. Since 5 is the max. number of child processes, other requests will then go into a backlog. From this, the OP's situation would result if queue items stay for 3 min in that backlog and time out. Which means, massive server overload. Which is not cured by more PHP-FPM child processes. As a rule of thumb, these should be about the same number as CPU cores on the server. – tanius Jan 30 '15 at 22:35
  • 1
    @tanius the question states "almost zero load consumption" - running out of workers is not an indicator of massive overload, only not being able to process requests faster than they come in. You only need 1 slow request and a few concurrent users/requests to run out of workers. _FPM_ workers are ordinarily configured based on the amount of _free memory_, not cpus - i.e. php processes aren't normally cpu-bound (they wait for databases, they wait for api response, they are badly written and wait for themselves... etc.). OTOH this answer is based on an assumption - just like the above comment =). – AD7six Jan 31 '15 at 09:14
  • @AD7six I've read things up again and you're right. *FPM* workers are configured based on free mem, *nginx* workers are configured according to CPU core count (like [here](http://stackoverflow.com/a/5456191/1270008)). Must've mixed things up. – tanius Jan 31 '15 at 17:10