2

I have a reminder type app that schedules tasks in celery using the "eta" argument. If the parameters in the reminder object changes (e.g. time of reminder), then I revoke the task previously sent and queue a new task.

I was wondering if there's any good way of keeping track of revoked tasks across celeryd restarts. I'd like to have the ability to scale celeryd processes up/down on the fly, and it seems that any celeryd processes started after the revoke command was sent will still execute that task.

One way of doing it is to keep a list of revoked task ids, but this method will result in the list growing arbitrarily. Pruning this list requires guarantees that the task is no longer in the RabbitMQ queue, which doesn't seem to be possible.

I've also tried using a shared --statedb file for each of the celeryd workers, but it seems that the statedb file is only updated on termination of the workers and thus not suitable for what I would like to accomplish.

Thanks in advance!

Eric Wang
  • 41
  • 3

3 Answers3

2

Interesting problem, I think it should be easy to solve using broadcast commands. If when a new worker starts up it requests all the other workers to dump its revoked tasks to the new worker. Adding two new remote control commands, you can easily add new commands by using @Panel.register,

Module control.py:

from celery.worker import state
from celery.worker.control import Panel

@Panel.register
def bulk_revoke(panel, ids):
    state.revoked.update(ids)

@Panel.register
def broadcast_revokes(panel, destination):
    panel.app.control.broadcast("bulk_revoke", arguments={
         "ids": list(state.revoked)},
         destination=destination)

Add it to CELERY_IMPORTS:

CELERY_IMPORTS = ("control", )

The only missing problem now is to connect it so that the new worker triggers broadcast_revokes at startup. I guess you could use the worker_ready signal for this:

from celery import current_app as celery
from celery.signals import worker_ready

def request_revokes_at_startup(sender=None, **kwargs):
    celery.control.broadcast("broadcast_revokes",
                             destination=sender.hostname)
asksol
  • 19,129
  • 5
  • 61
  • 68
0

I had to do something similar in my project and used celerycam with django-admin-monitor. The monitor takes a snapshot of tasks and saves them in the database periodically. And there is a nice user interface to browse and check the status of all tasks. And you can even use it even if your project is not Django based.

Praveen Gollakota
  • 37,112
  • 11
  • 62
  • 61
0

I implemented something similar to this some time ago, and the solution I came up with was very similar to yours.

The way I solved this problem was to have the worker fetch the Task object from the database when the job ran (by passing it the primary key, as the documentation recommends). In your case, before the reminder is sent the worker should perform a check to ensure that the task is "ready" to be run. If not, it should simply return without doing any work (assuming that the ETA has changed and another worker will pick up the new job).

Rob Golding
  • 3,502
  • 5
  • 26
  • 28
  • wouldn't this still theoretically require that I retain all previous Task results in the database, since any pruning would result in loss of guarantees that newly restarted worker processes don't run previously revoked tasks? – Eric Wang Apr 08 '12 at 13:26
  • I'm assuming that you've already got some sort of database model set up, which you're also using to store the task ID so you can revoke the task when necessary? If so, you can just add a `completed` flag to this model. – Rob Golding Apr 08 '12 at 13:29
  • one alternative I just came up with: keep a list of revoke task IDs, and each time after a celeryd process is spun up or restarted, a script will loop through the entire list and resend revoke commands. that way we only have to keep task IDs that have been revoked since the last script run. can you see any flaws in this implementation? – Eric Wang Apr 08 '12 at 13:30
  • You could probably make that work, but it doesn't strike me as a particularly elegant solution. I'm not sure when you'd be safe to prune the list of revoked tasks, either. My advice would be to give it a go :) – Rob Golding Apr 08 '12 at 13:37