So we have a kubernetes cluster running some pods with celery workers. We are using python3.6 to run those workers and celery version is 3.1.2 (I know, really old, we are working on upgrading it). We have also setup some autoscaling mechanism to add more celery workers on the fly.
The problem is the following. So let's say we have 5 workers at any given time. Then lot of tasks come, increasing the CPU/RAM usage of the pods. That triggers an autoscaling event, adding, let's say, two more celery worker pods. So now those two new celery workers take some long running tasks. Before they finishing running those tasks, kubernetes creates a downscaling event, killing those two workers, and killing those long running tasks too.
Also, for legacy reasons, we do not have a retry mechanism if a task is not completed (and we cannot implement one right now).
So my question is, is there a way to tell kubernetes to wait for the celery worker to have run all of its pending tasks? I suppose the solution must include some way to notify the celery worker to make it stop receiving new tasks also. Right now I know that Kubernetes has some scripts to handle this kind of situations, but I do not know what to write on those scripts because I do not know how to make the celery worker stop receiving tasks.
Any idea?