1

I have been in situation where I am not able to get PHP-FPM work for me even under slight increase in traffic. Have been trying to trace actual cause from a while and no success so far.

It started with particular site giving 502 error, looking into PHP-fpm logs I get this :

WARNING: [pool www-userA] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 39 idle, and 49 total children

Next I checked server resources, top says its almost 0 with only 2-4% resources utilization. Next was tweaking PHP-FPM pool :

pm = dynamic
pm.max_children = 800
pm.process_idle_timeout = 5s
pm.start_servers = 40
pm.min_spare_servers = 40
pm.max_spare_servers = 80
pm.max_requests = 500
php_admin_value[max_execution_time] = 60

;Added later to troubleshoot further
request_slowlog_timeout = 5s
slowlog = /var/log/pool_userA_fpm_slow_log

;Added later to compensate if there queue issue for troubleshooting
listen.backlog =24000

I have been almost to every PHP-FPM post related to this topic including : https://stackoverflow.com/questions/25097179/warning-pool-www-seems-busy-you-may-need-to-increase-pm-start-servers-or-pm

This server as around 12GB of RAM and 8 core processor only for nginx+php-fpm. My each PHP process is about 15-20MB each.

Tried increasing pm.max_children = 1500 but after a while again went see same error for "Pool being Busy".

I then enabled slowlog in php-fpm and also enabled slow log for mysql.

  • In php-fpm slow log, I found few php pages taking about 5 second to complete ,
  • In mysql slow log , found some queries examining 2-5 million rows (taking about 5 seconds to complete)

Assuming that PHP script might be causing queue or backlog, so I added listen.backlog =24000 as well as in /etc/security/limits.conf added soft and hard limit for this particular user so there is space for slow scripts ,

userA    soft    nofile    4096
userA    hard    nofile    65536

Further in sysctl ie.

echo "net.core.somaxconn=65536" >> /etc/sysctl.conf

Further in php-fpm master php-fpm.conf added, ie. outside pool conf:

rlimit_files = 65536
rlimit_core = 0

My ulimit -Hn says:

524288

Further since php-fpm was getting busy, I found that I can add following directives in php-fpm in order to restart in case of being busy, but it is not happening, I have to restart manually php-fpm to get site working again :

[global]
emergency_restart_threshold 10
emergency_restart_interval 1m
process_control_timeout 10s

As said, above directive is not making restart in case of pool being busy error in php-fpm.log

So far my guess it that due to slow PHP script, my php-fpm children are being exhausted and causing 502 error. I have no control over PHP and I need to present solution by adjusting server config for it.

I tried to increase pm.max_children = 2000 but still same issue. Sometimes getting 504 Gateway Time-out errors.

On other side, if I changed pm = ondemand

I get following notice first :

 listen.backlog(25000) was too low for the ondemand process manager. I updated it for you to 65535

Later got this error and again this time 504 error :

[11-Nov-2021 06:56:45] WARNING: [pool userA] server reached max_children setting (800), consider raising it.

One thing to note in all is that there is almost no load on server in all cases, 2-4% usage of resources. So my guess is that its more configuration issue than resource usage.

I have been to almost all PHP-FPM related topics here on serverfault and lots of docs but still no gain. Here hoping someone can point me in right direction.

Thanks

Saahib
  • 31
  • 4
  • 1
    Additional information request, please. Any SSD or NVME devices on MySQL Host server? Post on pastebin.com and share the links. From your SSH login root, Text results of: A) SELECT COUNT(*) FROM information_schema.tables; B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; AND very helpful information, includes - htop OR top for most active apps, ulimit -a for a Linux/Unix list of limits, iostat -xm 5 3 for IOPS by device and core/cpu count, for server workload tuning analysis to provide suggestions. – Wilson Hauck Nov 12 '21 at 20:58

0 Answers0