34

I want to create a group from a list returned by a Celery task, so that for each item in the task result set, one task will be added to the group.

Here's a simple code example to explain the use case. The ??? should be the result from the previous task.

@celery.task
def get_list(amount):
    # In reality, fetch a list of items from a db
    return [i for i in range(amount)]

@celery.task
def process_item(item):
    #do stuff
    pass

process_list = (get_list.s(10) | group(process_item.s(i) for i in ???))

I'm probably not approaching this correctly, but I'm pretty sure it's not safe to call tasks from within tasks:

@celery.task
def process_list():
    for i in get_list.delay().get():
        process_item.delay(i)

I don't need the result from the seconds task.

OmerGertel
  • 2,573
  • 1
  • 19
  • 27
  • 2
    Indeed, do *not* call a task from a task. This will cause deadlocks. Say you have one worker. You call your task, which ties up worker 1, then calls a second task. There's no worker to process that task and everything will hang. This nastiness gets slightly better as you add workers, but you'll always be tying up multiple workers with a single task (and losing parallelism). – mlissner Nov 14 '16 at 21:13

1 Answers1

50

You can get this kind of behavior using an intermediate task. Here's a demonstration of creating a "map" like method that works like you've suggested.

from celery import task, subtask, group

@task
def get_list(amount):
    return [i for i in range(amount)]

@task
def process_item(item):
    # do stuff
    pass

@task
def dmap(it, callback):
    # Map a callback over an iterator and return as a group
    callback = subtask(callback)
    return group(callback.clone([arg,]) for arg in it)()

# runs process_item for each item in the return of get_list 
process_list = (get_list.s(10) | dmap.s(process_item.s()))

Credit to Ask Solem for giving me this suggestion when I asked him for help on a similar issue.

todd
  • 1,631
  • 14
  • 5
  • 4
    Note that clone only does a shallow copy. If you want to clone a "complex" signature (like a chain, group or chord), you will need to either (ab)use python's deepcopy, as mentioned in [celery issue 2251](https://github.com/celery/celery/issues/2251). Or you move `callback = subtask(callback)` into the for-loop creating the functions and delete the `clone`. – Luis Nell Apr 11 '16 at 18:21
  • I've read the above comment about a dozen times and I don't get it. Could you provide an example, @LuisNell? – mlissner Mar 17 '17 at 19:01
  • @mlissner Given the above code, what I mean is the following. If we assume "callback" is not simply a single task, but rather a complex workflow (a group or a chord), you can't simply use `.clone()`. Groups and chords might be very complex (a group of groups etc.). In that case you can't simply use `.clone`, because that only creates a shallow copy of your callback signature. This means that arguments won't be passed on correctly. To make sure everything works as expected, you need to use `deepcopy`, as mentioned in my original comment – does that make it more clear? if not, i'll try again. – Luis Nell Mar 19 '17 at 13:07
  • 1
    If I understand this correctly, then the `dmap` function will execute the group synchronously, so two tasks for through the broker, whereas normally the `group()` would cause the individual `process_item` functions to be called in parallel. If that's right, is there any difference to `return [process_item(i) for i in it]`? – mjtamlyn May 25 '17 at 13:52
  • 1
    Nit pick: isn't `[i for i in range(n)]` equivalent to just `range(n)`? And should list `[arg,]` be tuple `(arg,)`? – quantoid Feb 25 '18 at 22:59
  • 2
    I've tried to do a two level version of this and it's not working. I've opened a new question at https://stackoverflow.com/q/59013002/3189 - any insights appreciated. – Hamish Downer Nov 23 '19 at 22:50
  • The given answer works for me, but i need to add one more task after process_item https://stackoverflow.com/questions/62676732/celery-chain-task-on-group. any pointers ? – navyad Jul 01 '20 at 12:23
  • The list `[arg, ]` should be changed to a tuple `(arg, )` . Otherwise you would not be able to pass additional arguments to `process_item(item, add_argum1, add_argum2)` which you might really need to process your data. Then you could put `process_list = (get_list.s(10) | dmap.s(process_item.s(additional_argument1, additional_argument2)))` – IAmBotmaker Feb 16 '21 at 13:27