8

Lately we've been running into some issues with our php-fpm processes spinning out of control and causing the site to become unresponsive. There's some obvious php-fpm configuration tooling that needs to be done, but I'd also like to implement a reasonable livenessProbe health check for the php-fpm container that will restart the container when the probe fails.

I've dug up several resources on how to ping the server as a health check (e.g. https://easyengine.io/tutorials/php/fpm-status-page/), but I have yet to find a good answer on what to be on the lookout for. Will the /ping route return something other than 'pong' if the server is effectively dead? Will it just time out? Assuming the latter, what is a reasonable timeout limit?

Running some tests of my own, I notice that a healthy php-fpm server will return the 'pong' response quickly:

# time curl localhost/ping
pong
real    0m0.040s
user    0m0.006s
sys 0m0.001s

I simulated heavy load and indeed it took 1-3 seconds for the 'pong' response, and that coincided with the site becoming unresponsive. Based on that I drew up a draft of a livenessProbe that will fail and restart the container if the liveness probe script takes longer than 2 seconds on 2 consecutive probes:

livenessProbe:
  exec:
    command:
    - sh
    - -c
    - timeout 2 /var/www/livenessprobe.sh
  initialDelaySeconds: 15
  periodSeconds: 3
  successThreshold: 1
  failureThreshold: 2

And the probe script is simply this (There are reasons why this needs to be a shell script and not a direct httpGet from the livenessProbe that I won't get into):

  #!/bin/bash

  curl -s localhost/ping

Now I don't know if I'm being too aggressive or too conservative. I'll be running a canary deploy to test this, but in the meantime I'd like to get some feedback from others that have implemented health checks on php-fpm servers, bonus points if it's in a Kubernetes context.

erstaples
  • 1,986
  • 16
  • 31

1 Answers1

5

If someone is still interested in this topic. I was looking into a sort of the same thing (php-fpm monitoring in combination with pods running in kubernetes).

I added the following health-check setup https://github.com/renatomefi/php-fpm-healthcheck to my container (the one running php-fpm) for checking if php-fpm is playing along nicely :) works pretty simple and gets the job done (marking the the container as "bad" when some of the values are getting out of your predefined limits)

scones
  • 3,317
  • 23
  • 34
Eric-PR
  • 101
  • 2
  • 4
  • Most places where I've found an answer to this question have this answer as the top answer, but I have had very bad luck with it under load. If your pods are terminating and restarting while processing and queueing requests, it jettisons those requests and forces the load onto fewer pods, potentially causing a cascade across your entire deployment. I think the answer can't be, "restart it if it's doing a lot of work," but I haven't found a better answer yet. – Trip Apr 19 '23 at 22:20