12

I'm having application built using celery and recently we got a requirement to run certain tasks on schedule.

I think celerybeat is perfect for this, but I got few questions:

  1. Is it possible to run multiple celerybeat instances, so that tasks are not duplicated?
  2. How to make sure that celerybeat is always up & running?

So far I read this: https://github.com/celery/celery/issues/251 and https://github.com/ybrs/single-beat

It looks like a single instance of celerybeat should be running.

I'm running application inside AWS elasticbeanstalk docker containers and celery workers are also docker containers (so it's quickly scaleable when needed).

It would be best to have celerybeat run through supervisord along with celery workers, but it seems this is not proper way to do this.

At the same time having that single instance of celerybeat would require manual provision/start and monitoring.

DmitrySemenov
  • 9,204
  • 15
  • 76
  • 121
  • 1
    Dmitry were you able to figure something out? I have the same issue. I have an autoscaling app in ElasticBeanstalk and beat scheduler is running multiple times – jplaza Sep 11 '18 at 14:02
  • 1
    @jplaza we ended up having just a single instance (dockerized) with AWS ECS deployed as a single task – DmitrySemenov Sep 11 '18 at 14:33

3 Answers3

10

To answer your 2 questions:

  1. If you run several celerybeat instances you get duplicated tasks, so afaik you should have only single celerybeat instance.

  2. I'm using supervisord as you mentioned to run celery workers and celerybeat workers as deamon so they should always be up & running.

my supervisord config:

[program:my_regular_worker]
command=python2.7 /home/ubuntu/workspace/src/manage.py celery worker -Q my_regular_worker-queue_name -c 1 -l info --without-mingle
process_name=my_regular_worker
directory=/home/ubuntu/workspace/src
autostart=true
autorestart=true
user=ubuntu
stdout_logfile=/tmp/my_regular_worker.log
redirect_stderr=true



[program:my_celerybeat_worker]
command=python2.7 /home/ubuntu/workspace/src/manage.py celery worker -Q my_celerybeat_worker-queue_name -c 1 -l info --without-mingle -B -s /tmp/celerybeat-schedule
NoamG
  • 1,145
  • 10
  • 17
2

I just found this solution as celery-beat replacement: RedBeat, blog post

Didn't use it yet though.

mangolier
  • 420
  • 4
  • 13
  • Disagree with the downvote- "didn't use it" never sounds promising, but RedBeat apparently actually is a purpose-built solution to this exact problem by the folks at Heroku – cmc Jan 14 '19 at 14:43
-4

You may run multiple instances of celery beat and tasks will not be duplicated.

Take a look at the celery.beat.Scheduler class, specifically the reserve() function. The scheduler will reserve a task before submitting it to the grid for execution. This prevents another instance of celery beat from submitting the same task.

We use MongoDB as a backing store for our scheduled tasks. Here is a sample document showing that the task has been reserved by one of the schedulers.

{
  "startdate": "2015-07-06 00:00:00", 
  "task": "cobalt.grid.tasks_facts.task_add", 
  "enddate": "2018-01-01 00:00:00", 
  "args": "[13.0, 42.0]", 
  "enabled": "True", 
  "last_run_at": "2015-08-13 15:04:49.058000", 
  "interval": "{u'every': u'1', u'period': u'minutes'}", 
  "relative": "False", 
  "total_run_count": "12", 
  "kwargs": "{}", 
  "reserved": "compute2:25703", 
  "_id": "ObjectId(55ccaf7784a3e752e73b08c2)", 
  "options": "{}"
}

http://celery.readthedocs.org/en/latest/reference/celery.beat.html#celery.beat.Scheduler

Damian
  • 79
  • 1
  • 3
  • 5
    I cannot reproduce this behavior with _djcelery_'s `DatabaseScheduler`. I'm using an interval schedule `timedelta(seconds=5)` and an _SQLite_ database for testing. Tasks get inserted into the queue and processed twice if two `beat` processes are running. – Feuermurmel Jun 09 '16 at 17:23
  • 2
    I could be wrong here, but isn't this "reservation" process just used so that multiple workers don't execute the *same* job? It doesn't have anything to do with synchronizing multiple schedulers so that they don't launch duplicate jobs. – booshong Oct 19 '16 at 20:19
  • 2
    @Feuermurmel I confirm that. Celerybeat must be run only once. – Karim N Gorjux Oct 08 '17 at 22:40