so I have a docker image that runs a celery worker via supervisor and works just fine on single-docker Elastic Beanstalk (pretty long tasks, so acks late = true
, concurrency = 1
and prefetch multiplier = 1
).
The trouble is I would like to scale instances depending on the effective task load of the workers, while EB only allows overall network and CPU load.
Adding a rule to scale up on CPU load works ok, but I have no guarantee that EB wouldn't decide to scale down whilst in the middle of a task. That would trigger a docker stop
and effectively kill any running celery that is not able to complete quite shortly (10s if I'm not mistaken).
Ideally I would need a monitor based on both CPU activity and tasks in the queue, in pseudo code something like:
while check interval has passed
if task queue is empty (or workers are not busy)
if running instances is greater than 1
scale down 1 instance
else
if CPU load is higher than threshold
scale up 1 instance
now the problem is that this level of logic doesn't seem achievable in EB, more likely to run on ECS but I am not sure of the following:
- what kind of celery monitor should we implement and where should the code run? e.g. the usual celery worker monitor commands don't seem to be a good solution to monitor how busy a worker is and we need to handle the additional complexity of running a worker in docker
- where should the cluster scaling function run? after a chat with an AWS engineer, seems that AWS Lambda could be a potential solution but hooking cluster instances load reports to lambda snippets seems quite complicated and hard to maintain
- as a bonus question, in case we need to migrate to ECS, we will also need to rewrite our deploy scripts to manually trigger a version swap, as this is currently managed by EB. what would be the best way to do so?
any help appreciated, thanks!