3

At the moment djcelery allows me to schedule a recurring task via the PeriodicTask model. For example a task that runs on an interval like every minute, or an interval specified by a crontab like every 1st of the month at noon. What I'd really like to do however is schedule a task for a fixed date that then repeats on an interval. For example first run on 3 March 2016 at 2:00 and then every hour thereafter.

Is there a way to achieve this within django and celery(with or without djcelery)? Thanks

fpghost
  • 2,834
  • 4
  • 32
  • 61

1 Answers1

4

As it is stated in the docs, you may implement your own custom scheduler. You should override the is_due method, which decides whether it is time to run the task.

Below is a proof-of-concept (I haven't checked it for errors). Note, that the __reduce__ method is also overridden so that the new parameter gets serialised as well.

import celery.schedules.schedule

class myschedule(celery.schedules.schedule):

    def __init__(self, *args, **kwargs):
        super(myschedule, self).__init__(*args, **kwargs)
        self.start_date = kwargs.get('start_date', None)

    def is_due(self, last_run_at):
        if self.start_date is not None and self.now() < self.start_date:
            return (False, 20)  # try again in 20 seconds
        return super(myschedule, self).is_due(last_run_at)

    def __reduce__(self):
        return self.__class__, (self.run_every, self.relative, self.nowfun, self.start_date)

And then you use it in the config:

CELERYBEAT_SCHEDULE = {
    'add-every-30-seconds': {
        'task': 'tasks.add',
        'schedule': myschedule(timedelta(seconds=30), start_date=start_date),
        'args': (16, 16)
    },
}
ldmberman
  • 99
  • 1
  • 3
  • Thanks. This is helpful and close to what I want, however I also need it to be a database scheduler (as currently I have a frontend interface for users to create PeriodicTask model instances to launch jobs). I guess I need to do something similar but perhaps with the `celery.beat.Scheduler` being overriden? (similar to how djcelery does it now in `djcelery.schedulers.DatabaseScheduler` but maybe overriding `is_due` there?) – fpghost Feb 15 '16 at 07:43
  • Yes, the database backend makes things a bit more complicated. Indeed, in order to avoid the mess of teaching `django-celery` to work with custom schedules like in my answer above I would rather update/override the `PeriodicTask` model to contain a `start_date` field and override `DatabaseScheduler` so that it takes it into account: ``` class MyScheduler(DatabaseScheduler): def is_due(self, entry): if entry.schedule.now() < entry.model.start_date: return False, 20 # try again in 20s return super(MyScheduler, self).is_due(entry) ``` – ldmberman Feb 15 '16 at 09:03
  • Where would be the correct place to overide `PeriodicTask`? Should I just fork djcelery or is there a better way? – fpghost Feb 15 '16 at 09:09
  • Or I guess if I just overide DatabaseScheduler I can point to it in settings.py, and in my CustomDatabaseScheduler, I set `Model` and `Entry` to my custom versions.. – fpghost Feb 15 '16 at 09:22
  • You can create your own model without touching djcelery and add the `start_date` field for it. Also, you would need to [register the signals](https://github.com/celery/django-celery/blob/master/djcelery/models.py#L290) in this case. Alternatively, you can update djcelery (`PeriodicTask` and `DatabaseScheduler`) because this looks like a useful feature. The latter looks like a better option to me. The only problem though is `djcelery` seems to be abandoned - your PR will hardly get through any time soon. – ldmberman Feb 15 '16 at 09:29
  • But if I just created my own `CustomPeriodicTask` model how would I wire it up so `djcelery` is actually using it? I guess the signals take care of that, but then also the old `PeriodicTask` would still be wired in (which possibly isn't a problem). Nevertheless, I'd probably still need to override the `is_due` of `ModelEntry` to actually make it use the new `start_date` field...The latter is probably the better idea anyway as you say. – fpghost Feb 15 '16 at 09:36
  • If you create `CustomPeriodicTask(PeriodicTask)` it would refer to the same table in the database and its instances would also come up in `PeriodicTask.objects.all`. So, `djcelery` will perfectly know about it. Talking about `is_due`, you can override it either in `ModelEntry` or in `DatabaseScheduler` because the latter receives an entry as an argument of the `is_due` method so you can get the date as `entry.model.start_date`. But yes, I would rather update `djcelery` instead. – ldmberman Feb 15 '16 at 09:50
  • @Idmberman so thanks so much with the help on this. I think I've pretty much got what I wanted written. One issue though before I accept your answer: the `start_date` as implemented like this is not really a start date right? If the task interval was every 50 days, then it could still be 50 days after the `start_date` before the task is started? – fpghost Feb 16 '16 at 09:04
  • According to the docs, if you schedule a task to run every 50 days it will be first sent 50 days after the beat starts, and then every 50 days after the last run. Here we do not let it start til the start date. Additionally, if you also want it to be sent on the start date you may update `is_due` to check whether `now` is in the `(start_date - step / 2, start_date + step / 2]` semi-interval where `step` is how often you call `is_due`. – ldmberman Feb 16 '16 at 09:44
  • Yeah on the start date is more preferable. I did it (I think) by just adding an `elif` to your if: `elif self.model.start_date is not None and (self.last_run_at < self.model.start_date or self.last_run_at is None): : #start`. That way if start date is earlier than now and the previous run was before the start date (or never) we know it should be run. – fpghost Feb 16 '16 at 10:10
  • Yes, your solution looks much more clear than what I 've proposed – ldmberman Feb 16 '16 at 11:34