celery beat HA - using pacemaker?

Question

As far as I know about celery, celery beat is a scheduler considered as SPOF. It means the service crashes, nothing will be scheduled and run.

My case is that, I will need a HA set up with two schedulers: master/slave, master is making some calls periodically(let's say every 30 mins) while slave can be idle.

When master crashes, the slave needs to become the master and pick up the left over from the dead master, and carry on the periodic tasks. (leader election)

The requirements here are:

the task is scheduled every 30mins (this can be achieved by celery beat)
the task is not atomic, it's not just a call every 30 mins which either fails or succeeds. Let's say, every 30 mins, the task makes 50 different calls. If master finished 25 and crashed, the slave is expected to come up and finish the remaining 25, instead of going through all 50 calls again.
when the dead master is rebooted from failure, it needs to realize there is a master running already. By all means, it doesn't need to come up as master and just needs to stay idle til the running master crashes again.

Is pacemaker a right tool to achieve this combined with celery?

score 1 · Answer 1 · answered Sep 11 '13 at 09:24

The short answer is "yes." Pacemaker will do what you want, here.

The longer answer is that your architecture is tricky due to the requirement to restart in the middle of a sequence.

You have two solutions available here. The first is to use some sort of database (or a DRBD file system) to record the fact that 25 of the 50 calls have been completed. The problem with this isn't the 24 completed calls, or the 25 yet-to-be-completed, it's the one that the system was doing, when it crashed. Call #25, say. If C25 wasn't yet started then you're OK. The slave will fire up under Pacemaker control, the DRBD file system will fail over, and the new master will execute #25 through #50. What happens though if #25 was called but the old master hadn't yet marked it as such?

You can architect it so that it marks the call as complete before it actually executes it, in which case, C25 won't get called on this particular occasion or you can mark it as complete after the call in which case C25 will get called twice.

Ideally, you would make the calls idempotent. This is your second option. In which case, it doesn't matter if C1 -> C25 get called again because there's no repeat affect. C26 -> C50 only get called a single time. I don't know enough about your architecture to say which would work, but hopefully this helps.

Pacemaker will certainly handle failing over. Add DRBD and you can save state between the two systems. However, you will need to address the partial-call issue yourself.

celery beat HA - using pacemaker?

1 Answers1