tl;dr: No Celerybeat is not suitable for your use case. You have to run just one process of celerybeat
, otherwise your tasks will be duplicated.
I know this is a very old question. I will try to make a small summary because I have the same problem/question (in the year 2018).
Some background: We're running Django application (with Celery) in the Kubernetes cluster. Cluster (EC2 instances) and Pods (~containers) are autoscaled: simply said, I do not know when and how many instances of the application are running.
It's your responsibility to run only one process of the celerybeat
, otherwise, your tasks will be duplicated. [1] There was this feature request in the Celery repository: [2]
Requiring the user to ensure that only one instance of celerybeat
exists across their cluster creates a substantial implementation
burden (either creating a single point-of-failure or encouraging users
to roll their own distributed mutex).
celerybeat should either provide a mechanism to prevent inadvertent
concurrency, or the documentation should suggest a best-practice
approach.
After some time, this feature request was rejected by the author of Celery for lack of resources. [3] I highly recommend reading the entire thread on the Github. People there recommend these project/solutions:
I did not try anything from the above (I do not want another dependency in my app and I do not like locking tasks /you need to deal with fail-over etc./).
I ended up using CronJob in Kubernetes (https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/).
[1] celerybeat - multiple instances & monitoring
[2] https://github.com/celery/celery/issues/251
[3] https://github.com/celery/celery/issues/251#issuecomment-228214951