88

For a task like this:

from celery.decorators import task

@task()
def add(x, y):
    if not x or not y:
        raise Exception("test error")
    return self.wait_until_server_responds(

if it throws an exception and I want to retry it from the daemon side, how can apply an exponential back off algorithm, i.e. after 2^2, 2^3,2^4 etc seconds?

Also is the retry maintained from the server side, such that if the worker happens to get killed then next worker that spawns will take the retry task?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Quintin Par
  • 15,862
  • 27
  • 93
  • 146

3 Answers3

160

The task.request.retries attribute contains the number of tries so far, so you can use this to implement exponential back-off:

from celery.task import task

@task(bind=True, max_retries=3)
def update_status(self, auth, status):
    try:
        Twitter(auth).update_status(status)
    except Twitter.WhaleFail as exc:
        raise self.retry(exc=exc, countdown=2 ** self.request.retries)

To prevent a Thundering Herd Problem, you may consider adding a random jitter to your exponential backoff:

import random
self.retry(exc=exc, countdown=int(random.uniform(2, 4) ** self.request.retries))
fjsj
  • 10,995
  • 11
  • 41
  • 57
asksol
  • 19,129
  • 5
  • 61
  • 68
  • Do you know if this is a server side retry or the client is kept to wait? If the client is kept to wait then it’s bad. – Quintin Par Mar 18 '12 at 05:18
  • 2
    As far as I know the countdown attribute sets an eta for the task at the MQ backend (e.g. RabbitMQ). So it is not set on the client side. – idanzalz Nov 28 '12 at 13:24
  • client is not kept to wait unless you do `result.get()` which is an explicit request to wait for the result to be ready, but there's also a timeout argument and there's a RETRY state so you can check if the task is being retried (and what the reason for the retry was) – asksol Nov 30 '12 at 12:12
  • 10
    For celery 3.1, you should use `@task(bind=True)` and celery will pass `self` into the function as the first argument, so you would change the args to be `def update_status(self, auth, status):` which then gives you access to `self.retries` – robbyt Dec 20 '13 at 02:27
  • 3
    thanks @robbyt ! just a small correction - [`retries` is an attribute of `request`](http://celery.readthedocs.org/en/latest/userguide/tasks.html#context), so `self.request.retries` is the proper call. – tutuDajuju May 20 '15 at 12:43
  • Can you add an example how to call the task with `@task(bind=True)`, Normally I would just, `from tasks.py import update_status; update_status(auth, status)` but what should I pass in for `self`? – Matt Sep 25 '17 at 16:19
  • Please see other answer to this question for a built-in answer instead: https://stackoverflow.com/a/46467851/9190640 – jorf.brunning Jun 04 '21 at 01:57
56

As of Celery 4.2 you can configure your tasks to use an exponential backoff automatically: http://docs.celeryproject.org/en/master/userguide/tasks.html#automatic-retry-for-known-exceptions

@app.task(autoretry_for=(Exception,), retry_backoff=2)
def add(x, y):
    ...

(This was already in the docs for Celery 4.1 but actually wasn't released then, see merge request)

  • 3
    Nice catch, scratching my heads in 4.1.0, why my parameter of "retry_backoff" not respected. – kororo Apr 30 '18 at 08:24
  • 2
    @kororo it doesn't seem to work with `self.retry`, only other exception types – rdrey Aug 10 '18 at 14:12
  • With this approach you also benefit from the built in `retry_jitter` (defaulted to `True`) which avoids the Thundering Herd Problem mentioned in asksol's answer – qwertysmack Nov 17 '20 at 15:26
  • This is the correct answer given that it is built-in, and does not require manually handling countdown – jorf.brunning Jun 04 '21 at 01:57
  • Does this also work when `retry()` is called? It doesn't seem to work for non-automatic retries (on Celery 4.2.2 at least). Anyone has any idea? – Sarang Jan 17 '22 at 12:53
3

FYI, celery has a util function to calculate exponential backoff time with jitter here, so you don't need to write your own.

def get_exponential_backoff_interval(
    factor,
    retries,
    maximum,
    full_jitter=False
):
    """Calculate the exponential backoff wait time."""
    # Will be zero if factor equals 0
    countdown = min(maximum, factor * (2 ** retries))
    # Full jitter according to
    # https://www.awsarchitectureblog.com/2015/03/backoff.html
    if full_jitter:
        countdown = random.randrange(countdown + 1)
    # Adjust according to maximum wait time and account for negative values.
    return max(0, countdown)
boatcoder
  • 17,525
  • 18
  • 114
  • 178
lgylym
  • 206
  • 1
  • 8
  • 2
    In the future, avoid link-only answers, as links tend to go stale over time. Best to also include a code snippit and explanation in your answer for maximum upvotes and value-add. Edit: case in point, this answer's link is already broken https://stackoverflow.com/a/46467851/366529 – dKen Apr 08 '22 at 06:11