1

I have a few tasks like this:

@celery.task
def generate():
    sleep(1.0)
    print "Generate done!"
    return 'result'

@celery.task
def lower(result):
    sleep(1.0)
    print "Lower done!"
    return result.lower()

@celery.task
def upper(result):
    sleep(1.0)
    print "Upper done!"
    return result.upper()

@celery.task
def upload(result):
    sleep(1.0)
    print "Upload done for: %s!" % (result,)
    return 'upload'

@celery.task
def callback(results):
    print "It's all done! %s" % (results,)

I'm creating a chord that looks like this:

chord(
    header=chain(
        generate.s(),
        group(
            chain(lower.s(), upload.s()),
            chain(upper.s(), upload.s())
        )
    ), body=callback.s()
).delay()

The problem that I'm experiencing is that my callback, which is supposed to fire after all tasks have been completed, seems to fire right after generate.

In case it's not clear, the workflow is like this:

  1. Generate a result, then pass its result onto members of a group, so as to achieve parallelism:
    1. Group one will take the result from generate, convert it to lowercase using lower, and then upload the result using upload.
    2. Group two will take the result from generate, convert it to uppercase using upper, and then upload the result using `upload'.
  2. After all of this is done, the callback task callback should be called.

Expected

The callback task will be called at least 3 seconds after starting.

Actual

The callback task is called around 1 second after starting, and does not wait for members of the group to finish executing.

Here are the logs proving that it doesn't wait for groups:

[2013-11-17 18:20:40,447: WARNING/PoolWorker-8] Generate done!
[2013-11-17 18:20:41,493: WARNING/PoolWorker-6] Upper done!
[2013-11-17 18:20:41,493: WARNING/PoolWorker-1] Lower done!
[2013-11-17 18:20:41,535: WARNING/PoolWorker-6] It's all done! [('e0016a35-d538-4e96-ad86-6ddf91ef4a09', [('b1af78a9-7935-4037-84e4-9fae6d7c027e', None), ('d69c4c99-af9c-476f-af7d-7f647c4d9c83', None)])]
[2013-11-17 18:20:42,522: WARNING/PoolWorker-7] Upload done for: result!
[2013-11-17 18:20:42,523: WARNING/PoolWorker-5] Upload done for: RESULT!

It seems that Celery doesn't wait on groups. Is there a way to have Celery wait until ALL tasks, including members of groups, are finished executing?

Naftuli Kay
  • 87,710
  • 93
  • 269
  • 411

1 Answers1

4

You're using a chain as the chord header here, but the header must be a group:

chord(
    header=chain(
        generate.s(),
        group(
            chain(lower.s(), upload.s()),
            chain(upper.s(), upload.s())
        )
    ), body=callback.s()
).delay()

With chain(generate.s(), group(...) there is nothing to synchronize with as the group happens in parallel.

Your workflow could better be expressed like this:

filters = group(lower.s() | upload.s(),
                upper.s() | upload.s())
result = (generate.s() | filters | callback.s())()

Note: chain(group, sig) is automatically converted to chord

asksol
  • 19,129
  • 5
  • 61
  • 68
  • The only reason I haven't just used a chain until this point is that I need a callback that will _always_ fire, as my `callback` task does cleanup and reporting on what worked and what didn't. With your example above, will the `callback` always be triggered? – Naftuli Kay Nov 18 '13 at 17:28