Django celery - start a periodic task after a set duration of the completion of the previous instance

Question

Currently I have a periodic task that runs every 30 minutes. The task itself sometimes takes more than 30 minutes to complete.

How can I change such that the periodic task runs 30 minutes AFTER the last run was completed?

@periodic_task(run_every=timedelta(minutes=30), queue='activities', options={'queue': 'activities'})
def pull_activities_frequent_adaptors():
    adaptors_queryset = TrackingAppAdaptor.objects.adaptors_that_pull_activities_frequently()
    pull_activities_from_adaptors(adaptors_queryset)

score 0 · Answer 1 · answered Apr 05 '15 at 17:03

I'm sorry, buy you can't do that exactly.

Since there are multiple workers who executes the job you will actually need to make sure no worker is running the previous task.

What you can do:

Using a celery backend you can monitor if there are still tasks in progress and not execute the current task. this will allow you to create a situation where there is only one task been executed.
When a task is finish, you can then create a trigger for sending a new task that will wait 30 minutes, you can implement that using ETA

erewok · Answer 2 · 2015-04-05T17:24:37.550

Celery doesn't support this out-of-the-box, but I have had to do similar stuff in the past and I had to stumble through a solution on my own.

In my experience, there are two somewhat straightforward ways to achieve this, both with trade-offs. There are also some pretty big holes you can step into with this stuff so caveat emptor.

Option 1:

Use some datastore to save information about when the task should be run and to trigger the celery beat task.

To do this, you could use your db and a model that holds some information about the periodic task. (If you wanted to get more get tech, you could even talk to queue directly and skip the models route, probably as well.)

from django.db import models

class PeriodicTask(models.Model):
   lastrun = models.DateTimeField()
   nextrun = models.DateTimeField()
   notes = models.TextField()  # errors?
   task_id = models.CharField(max_length=100)

That's just kind of a rough idea of what the model might store. You can put whatever is useful on there, but you'll need a datetime object to store when the next run should be.

Next, your periodic task needs to run more frequently in order to spin up and see if there are any tasks that need to be executed soon:

import datetime
from .models import PeriodicTask

@periodic_task(run_every=timedelta(minutes=2), queue='activities', options={'queue': 'activities'})
def pull_activities_frequent_adaptors():
    now = datetime.datetime.utcnow()  # need to be clear about time-zones
    scheduled_tasks = PeriodicTask.objects.filter(nextrun__gte=now)

    if scheduled_tasks and scheduled_tasks.count() == 1: # more than one and we've erred somewhere

        timewindow = datetime.timedelta(minutes=5)
        if (scheduled_tasks[0].nextrun - now) <= timewindow:
            scheduled_tasks[0].delete()
            # Do the task
            # schedule the next one
            PeriodicTask.objects.create(
                 lastrun=now,
                 nextrun=now + datetime.timedelta(minutes=30))

Potential issues:

1) If you have multiple databases with a master-slave setup and, in particular if you have lag, you could end up double scheduling stuff (even with the count() == 1 part). Thus, there's a race condition which is worth thinking about.

2) It's hard to get close to exactly 30 minutes because you have to use a time window to find tasks to execute.

3) The task needs to run more often than your time window otherwise you could miss it. This is potentially a waste of resources (but not too terrible of one, I suppose) because it's usually spinning up and doing nothing.

4) Nothing boggles the mind more than dealing with datetimes, so you have to really consider the time zone thing and think about all the variations and test the hell out of this code.

5) This one is a biggy: if the task takes longer to run than the interval it's scheduled for, then you'll have two tasks running concurrently, which is a problem. Again, with race conditions, things can get dicey.

Option 2)

Don't use celery beat: fire off the first task and have it fire off another one after 30 minutes. This has the potential to become a runaway sorcerer's apprentice-type of thing, so I find it to be a bit, um, scary, and while I have done the first option, I've never really talked myself into the following. But, anyway, I think it could be done:

@task  # no longer a periodic task
def your_task(args):
    # Whatever you want to do, then call itself again...
    your_task.apply_async(args=(args), countdown=1800)

Now you just need to call this somewhere, probably in a cron job that spins up once a week and kills any previous versions of this thing (how does it find them?) and then fires off the first one.

I have to say that I don't really like this answer and even though it's occurred to me a handful of times, it seems like a more dangerous and unruly way to address the problem. I'd be curious if anyone does it, though.

Django celery - start a periodic task after a set duration of the completion of the previous instance

2 Answers2