1

We just had an issue on 2 servers (both unrelated, beefy servers, and one not public-facing) at the same time we had 502 Gateway where I had to manually reload PHP to fix it.

Looking back in the PHP logs, I noticed PHP was reloaded (trying to work out why - maybe automated PHP version update by Runcloud?) and then many PHP Child processes were spawned and exited, each exiting in under 1 second each - very strange as all historical processes exited after 1500+ seconds as normal.

Server 1 had lots that said: "child XXXX exited on signal 6 (SIGABRT - core dumped) after 0.848228 seconds from start"

Server 2 (not open to public) had lots that said: "child XXXX exited with code 1 after 0.036807 seconds from start"

I can't seem to figure out what has caused it. I've looked through traffic logs and nothing looks too crazy, and I can't find anything that would've caused the PHP restart, nor has it been regular in the past that we haven't triggered ourselves via deployment.

I can only put it down to maybe RunCloud running an update on our servers and it breaking - still can't explain why after a PHP reload, the child processes went crazy and broke both servers resulting in 502, and then my manual reload of PHP which fixed it and everything is back to normal.

Super super strange, any ideas?

Jonathan Bird
  • 313
  • 5
  • 19
  • I’ve switched from PHP ondemand processes to dynamic processes but it doesn’t explain why it randomly crashed and why all my child processes on ondemand started and finished in under 1 second – Jonathan Bird May 07 '23 at 00:39
  • Does the nginx restarts as well? I mean....is the nginx that restarts or are the "php pools"? – Daniele Continenza May 08 '23 at 07:10
  • @DanieleContinenza nah nginx didn’t restart. PHP only, and then all the pools start and closed immediately as I mentioned. Restarted PHP and never happened again, so strange – Jonathan Bird May 08 '23 at 08:02
  • Hm. Does the crons log say something? What about daemons log, syslog or kernnel log? As you pointed out, restarting the pools solved the problem but it's hard to guess the originating problem that cause the issue, without further investigations on logs. What's the OS of the servers? – Daniele Continenza May 08 '23 at 08:14
  • I’ll check those out. Nginx logs showed not much. Both Ubuntu. Different applications but both run Laravel and Horizon for the queues. Crons are totally different but Laravel is the framework for both – Jonathan Bird May 08 '23 at 09:36
  • Are they in a "cluster" environment, or similar, like high availability or load balancing? Or....behind a some kind of proxy/firewalls? – Daniele Continenza May 08 '23 at 09:43
  • Nope and no MySQL or Redis on the servers either, both using AWS services. Apps are both standard web apps with Nginx, PHP. My third application had PHP dynamic set rather than on demand and that didn’t have the issue so I’ve changed these two servers configs. – Jonathan Bird May 08 '23 at 09:45
  • 1
    Some caching system, like Apc? – Daniele Continenza May 08 '23 at 09:49
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253532/discussion-between-jonathan-bird-and-daniele-continenza). – Jonathan Bird May 08 '23 at 09:50

1 Answers1

0
  1. SIGABRT is PHP or one of its PECL extensions crashing with "undefined behavior".
  2. Since you get these errors in FPM, that is definitely not a cronjob to blame.
  3. You can disabling some extensions such as APCu and/or opcache/preloading. But that's just a blind trial which may or may not work. What you need to find the cause for certain is to collect core dump and analyze it.
Sam Dark
  • 5,291
  • 1
  • 34
  • 52