I am putting together a Celery based data ingestion pipeline. One thing I do not see anywhere in the documentation is how to build a flow where workers are only running when there is work to be done. (seems like a major flaw in the design of Celery honestly)
I understand Celery itself won't handle autoscaling of actual servers, thats fine, but when I simulate this Flower doesn't see the work that was submitted unless the worker was online when the task was submitted. Why? I'd love a world where I'm not paying for servers unless there is actual work to be done.
Workflow:
Imagine a While loop thats adding new data to be processed using the
celery_app.send_task
method.I have custom code that sees theres N messages in the queue. It spins up a Server, and starts a Celery worker for that task.
Celery worker comes online, and does the work.
BUT.
Flower has no record of that task, even though I see the broker has a "message", and while watchings the output of the worker, I can see it did its thing.
If I keep the worker online, and then submit a task, it monitors everything just fine and dandy.
Anyone know why?