I am running into an issue on an ECS cluster including multiple Celery workers when the cluster requires up-scaling.
Some background:
- I have a task which is running potentially for a few hours.
- Celery workers on an ECS cluster are currently scaled based on queue depth using Flower. Whenever the queue depth is larger than 1, it scales up a worker to potentially receive more tasks.
- The broker used is Redis.
- I have set the
worker_prefetch_multiplier
to 1, and each worker's concurrency equals 4.
The problem definition: Because of these settings, each of the workers prefetches 4 tasks, before filling the queue depth. So let's say we have a single worker running, it requires 8 tasks to be invoked before the queue depth fills to 1 on the 9th task. 4 tasks will be in the STARTED state and 4 tasks will be in the RECEIVED state. Whenever, scaling up the number of worker nodes to 2, only the 9th task will be send to this worker. However, this means that the 4 tasks in the RECEIVED state are "stuck" behind the 4 tasks in the STARTED state for potentially a few hours, which is undesirable.
Investigated solutions:
- When searching for a solution one finds in Celery's documentation (https://docs.celeryproject.org/en/stable/userguide/optimizing.html) that the only way to disable prefetching is to use
acks_late=True
for the tasks. It indeed solves the problem that no tasks are prefetched, but it also causes other problems like replicating tasks on newly scaled worker nodes, which is DEFINITELY not what I want. - Also ofter the setting
-O fair
on the worker is considered to be a solution, but seemingly it still creates tasks in the RECEIVED state.
Currently, I am thinking of a little complex solution to this problem, so I would be very happy to hear other solutions. The current proposed solution is to set the concurrency to -c 2
(instead of -c 4
). This would mean that 2 tasks will be prefetched on the first worker node and 2 tasks are started. All other tasks will end up in the queue, requiring a scaling event. Once ECS scaled up to two worker nodes, I will scale the concurrency of the first worker from 2 to 4 releasing the prefetched tasks.
Any ideas/suggestions?