How can I organize healthcheck for my workers (Docker with python code)?

Question

In my project I have a REST API (Flask-server) and workers (python script that work in Docker containers - simple infinite cycle, make request to server and get work). Flask-server send tasks to workers.

But of course something might go wrong and Docker container may fall. And at that moment I want server to know about it. Can you offer a simple way to do that?

Mark Setchell · Accepted Answer · 2020-09-16T00:57:45.830

It's quite easy to do this type of thing with Redis which is a very light-weight, lightning fast, in-memory data-structure server. It can serve lists, hashes, sets, ordered sets, integers, atomic integers, strings and so on all across your network with clients for bash, Python, PHP, C/C++ and everything else.

I would use a simple KEY and set an EXPIRE on the key, say 10s. Then each server and each client in your network simply SETs the KEY named by its hostname (or function) at least every 10s which will reset its timeout. If anyone queries the key and it hasn't been set for 10s they know that client/server or host/process has died.

Documentation is here.

Another, even simpler way to communicate is via the filesystem. Each worker must "touch" (create) a file called /tmp/WORKERXXX.alive every N seconds. The process that wants to check the workers are alive then checks the files exist every N+1 seconds and deletes them. If a file doesn't exist, the worker is restarted.

Touching the file doesn't have to be intrusive within your worker's code. It can simply create an extra thread at start-up which runs an infinite loop, sleeping N seconds and then touching its keep-alive file.

pangyuteng · Answer 2 · 2022-06-11T18:44:18.437

In addition to Mark's solution, without changing the python script (running that infinite loop), you can leverage docker-compose and add a ** crude ** healthcheck via checking if the python process is running or not.

kill -0 $(pgrep python)

Above returns 0 if process exists, and when the process does not exist, the response will be 1 - thus triggering docker-compose to restart the python script.

DISCLAIMER. The above command as is, assumes one python process running within the container. If you are not running a python script, or you use threads in the script, you will need to modify the above command inorder to get the right PID.

Alternatively, you can leverage existing asynchronous task queue frameworks, such as Celery, and use celery inspect as healthcheck each worker in docker-compose - see this post for more detail: HEALTHCHECK of a Docker container running Celery tasks?

How can I organize healthcheck for my workers (Docker with python code)?

2 Answers2