I am using redash where supervisord is used for process management of redash workers.
It is using supervisord eventlistener for healthchecks.
eventlistener for healthcheck in supervisord worker configuration looks like this
[eventlistener:worker_healthcheck]
serverurl=AUTO
command=./manage.py rq healthcheck
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
events=TICK_60
When above event listener is run as a part of supervisord then output in logs looks like this
2023/02/01 11:34:01 [worker_healthcheck] WorkerHealthcheck: Worker rq:worker:cc79d5a02143436f99d4bccf5af64c5d healthcheck: Is busy? False. Seen lately? False (241 seconds ago). Has nothing to do? True (0 jobs in watched queues). ==> Is healthy? True
2023/02/01 11:34:01 [worker_healthcheck] `RQ Worker Healthcheck` check succeeded for process worker-1
2023/02/01 11:35:01 [worker_healthcheck] Received TICK_60 event from supervisor
2023/02/01 11:35:01 [worker_healthcheck] Performing `RQ Worker Healthcheck` check for process name worker-0
2023/02/01 11:35:01 [worker_healthcheck] Performing `RQ Worker Healthcheck` check for process name worker-1
2023/02/01 11:35:01 [worker_healthcheck] WorkerHealthcheck: Worker rq:worker:ca2878d2e80e4c47a8414478ed15873f healthcheck: Is busy? False. Seen lately? False (304 seconds ago). Has nothing to do? True (0 jobs in watched queues). ==> Is healthy? True
2023/02/01 11:35:01 [worker_healthcheck] `RQ Worker Healthcheck` check succeeded for process worker-0
RESULT 2
OKREADY
But when I take that command out and run directly in the container and NOTE : it doesn't give me same output and also i've to close the stream after some time (^C)
Also to run this command I need to export SUPERVISOR_SERVER_URL=http://localhost:9001
redash@redash-adhocworker-778fc5dbb6-xrnp9:/app$ ./manage.py rq healthcheck
2023/02/01 11:34:14 [worker_healthcheck] Starting the health check for worker process Checks config: [(<class 'redash.cli.rq.WorkerHealthcheck'>, {})]
2023/02/01 11:34:14 [worker_healthcheck] Installing signal handlers.
READY
^C2023/02/01 11:34:58 [worker_healthcheck] Got signal 2
2023/02/01 11:34:59 [worker_healthcheck] Health check for worker process has been told to stop.
2023/02/01 11:34:59 [worker_healthcheck] Done.
When the healthcheck continuosly fails logs looks like this
2023/02/01 06:22:01 [worker_healthcheck] Received TICK_60 event from supervisor
2023/02/01 06:22:01 [worker_healthcheck] No processes in state RUNNING found for process worker
RESULT 2
OKREADY
Questions
- How can i run healthcheck script in same way outside of supervisord(
./manage.py rq healthcheck
) in terminal to get healthcheck status ? - Can I add
autorestart
flag in eventlistener configured with healthcheck and doesautorestart
will restart all the programs configured in supervisord ? - By hit and trial I found address for
SUPERVISOR_SERVER_URL
which is needed to run healthcheck(./manage.py rq healthcheck
), how can I find it correctly ?
Any other suggestion is welcomed for how can i restart supervisord programs when no processes are found in RUNNING state ?