9

If I understood the tutorial correctly, Celery subtask supports almost the same API as task, but has the additional advantage that it can be passed around to other functions or processes.

Clearly, if that was the case, Celery would have simply replaced tasks with subtasks instead of keeping both (e.g., the @app.task decorator would have converted a function to a subtask instead of to a task, etc.). So I must be misunderstanding something.

What can a task do that a subtask can't?

Celery API changed quite a bit; my question is specific to version 3.1 (currently, the latest).

Edit:

I know the docs say subtasks are intended to be called from other tasks. My question is what prevents Celery from getting rid of tasks completely and using subtasks everywhere? They seem to be strictly more flexible/powerful than tasks:

# tasks.py
from celery import Celery
app = Celery(backend='rpc://')

@app.task
def add(x, y):
    # just print out a log line for testing purposes
    print(x, y)

# client.py
from tasks import add
add_subtask = add.subtask()
# in this context, it seems the following two lines do the same thing
add.delay(2, 2)
add_subtask.delay(2, 2)
# when we need to pass argument to other tasks, we must use add_subtask
# so it seems add_subtask is strictly better than add
max
  • 49,282
  • 56
  • 208
  • 355
  • I think subtasks are secondary tasks run from a task. http://stackoverflow.com/questions/6349371/celery-task-that-runs-more-tasks – Hussain Oct 02 '16 at 20:23
  • @Hussain Yes, precisely. See my updated question. – max Oct 02 '16 at 20:35

1 Answers1

2

You will take the difference into account when you start using complex workflows with celery.

A signature() wraps the arguments, keyword arguments, and execution options of a single task invocation in a way such that it can be passed to functions or even serialized and sent across the wire.

Signatures are often nicknamed “subtasks” because they describe a task to be called within a task.

Also:

subtask‘s are objects used to pass around the signature of a task invocation, (for example to send it over the network)

Task is just a function definition wrapped with decorator, but subtask is a task with parameters passed, but not yet started. You may transfer the subtask serialized over network or, more used, call it within a group/chain/chord.

baldr
  • 2,891
  • 11
  • 43
  • 61
  • Hmm, that is clear, but I somehow still don't understand the answer to my original question: aren't subtasks strictly more powerful/flexible than tasks? Why even bother having tasks, why not remove that concept from Celery, and only deal with subtasks? Unlike tasks, subtasks can be passed around, can be partialized, can be combined in complex workflow. And yet, subtasks also have `.delay()` method just like tasks. Is there anything that a task can do that subtask can't? – max Oct 03 '16 at 17:43
  • As far I understand it - yes, subtasks looks more flexible because they allow to organize workflows. But if you do not use complex 'canvas' workflows - you may use just tasks. Task is a more high-level abstraction than subtasks. Subtask is nothing but a serialized task. – baldr Oct 03 '16 at 17:54
  • 1
    How is task a higher-level abstraction?.. Based on my understanding, and on your answer, *task* can do a tiny fraction of what *subtask* can do, and can do nothing that *subtask* can't... Why would I want to use a crippled object, when the much more powerful object is just as easy to use? – max Oct 03 '16 at 18:21
  • First of all, I guess `task` historically comes first rather than `subtask`. Then, `subtask` is just an alias to `signature` as we can see from the code. They have different goals. `task` is a class implementing functions to manage execution process. `subtask` is this class technically wrapped to `dict`-like object. When you use decorator `@task` for a function - you receive an object instead. And it just implements method `subtask`. You *can* use any of these approaches, but in celery they are different entities. – baldr Oct 03 '16 at 21:06
  • 1
    History: understood; but are you implying *subtask* isn't backward compatible with *task*? How so? The API seems the same, what am I missing? – max Oct 04 '16 at 08:13
  • The same API does not mean the same objects. Subtask is just a packed task. – baldr Oct 04 '16 at 12:49
  • 1
    The Celery nomenclature is inconsistent and obfuscated. So-called "subtasks" can be called directly (not just from within tasks), and tasks can be called from within tasks without using signatures at all. It's a friggin mess. – odigity Apr 29 '22 at 14:05