6

I have a real head scratcher with one site on a server affecting others in separate php-fpm pools. I thought the idea was that php-fpm pools gave isolation to stop (reduce) this from happening.

We have a typical LEMP server (Ubuntu 16.04 running NGINX with both php-fpm 7.0 and 5.6. MySQL is on another box.) running a number of sites of various sizes.

To clarify, this does not appear to be a server resource issue; CPU, memory, inodes, open files, networking, we have checked everything we can think of and still have plenty of headroom on the server.

The pools however are limited in resource...

/etc/php/5.6/fpm/pool.d/siteone.conf

[siteone]
user = siteone
group = siteone
listen = /var/run/php5.6-fpm-siteone.sock
listen.owner = www-data
listen.group = www-data
pm = dynamic
pm.max_children = 25
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
php_admin_value[error_log] = /var/log/php-fpm/siteone/siteone-php-fpm.log
php_value[newrelic.appname] = "siteone - LIVE"
php_admin_value[upload_max_filesize] = 5M
request_terminate_timeout = 5m
pm.max_requests = 5000

Yet when one site hits maxchildren all the sites start timing out.

Any advice please.

Thank you in advance.

Dogsbody
  • 656
  • 4
  • 15
  • Are you certain you've got pool**s** and not just one shared pool? You do have to specifically set up each individual pool - the default config will give you just one that all sites use. – ceejayoz Sep 18 '18 at 14:50
  • Very certain we have a separate pool for each site :-) – Dogsbody Sep 18 '18 at 14:52
  • You're certain they're being used as you expect? What sort of behavior do you see when one of the sites hits the limits? (Sorry, we get enough "oh that was silly of me" issues around here I'm trying to clear out the basics.) – ceejayoz Sep 18 '18 at 15:01
  • 1
    Totally understand, three of us have looked at this now so not expecting that. Each site has it's own socket which just wouldn't even work if they were set wrong as the site files would be owned by the wrong user. I've updated the OP with the full pool config. When this happens one site hits maxchildren and all the sites start timing out. – Dogsbody Sep 18 '18 at 15:11
  • You say you have both PHP 5.6 and 7.0 running. Is this problem across both? In other words, if a site in 5.6 hits `max_children` and all of the other 5.6 sites time out, does that also affect the 7.0 sites? Also, do you see this problem if a 7.0 site hits `max_children`? – Moshe Katz Sep 20 '18 at 19:02
  • We have great isolation between 5.6 and 7.0. When this happens all 7.0 sites stay up and running without issue. We also haven't seen this issue at all in 7.0, this does seem to only be a 5.6 thing :-/ – Dogsbody Sep 21 '18 at 08:56
  • 4
    If you enable the php-fpm status page with an nginx config passing to it locked down to `allow 127.0.0.1; deny all;`. Talking to this locally (e.g. with curl) will give you some statistics on the listen queue. If the listen queue gets too full it will start rejecting connections and nginx will show a 502/504 response. You can tune the php-fpm config value `listen_backlog` to increase the queue. You might have to tweak some kernel variables too. – jedifans Oct 10 '18 at 05:16

0 Answers0