Using nested asyncio.gather() inside another asyncio.gather()

Question

I have a class with various methods. I have a method in that class something like :

 class MyClass:

    async def master_method(self):
      tasks = [self.sub_method() for _ in range(10)]
      results = await asyncio.gather(*tasks)

    async def sub_method(self):
      subtasks = [self.my_task() for _ in range(10)]
      results = await asyncio.gather(*subtasks)

    async def my_task(self):
      return "task done"

So the question here is:

Are there any issues, advantages/disadvantages with using asyncio.gather() inside co-routines that are being called from another asyncio.gather() ? Any performance issues?
Are all tasks in all levels treated with the same priority by asyncio loop? Would this give the same performance as if I have called all the co-routines with a single asyncio.gather() from the master_method?

MisterMiyagi · Accepted Answer · 2021-11-03T07:56:21.653

TLDR: Using gather instead of returning tasks simplifies usage and makes code easier to maintain. While gather has some overhead, it is negligible for any practical application.

Why `gather`?

The point of gather to accumulate child tasks before exiting a coroutine is to delay the completion of the coroutine until its child tasks are done. This encapsulates the implementation, and ensures that the coroutine appears as one single entity "doing its thing".
The alternative is to return the child tasks, and expect the caller to run them to completion.

For simplicity, let's look at a single layer – corresponding to the intermediate sub_method – but in different variations.

async def child(i):
    await asyncio.sleep(0.2)  # some non-trivial payload
    print("child", i, "done")

async def encapsulated() -> None:
    await asyncio.sleep(0.1)  # some preparation work
    children = [child() for _ in range(10)]
    await asyncio.gather(*children)

async def task_children() -> 'List[asyncio.Task]':
    await asyncio.sleep(0.1)  # some preparation work
    children = [asyncio.create_task(child()) for _ in range(10)]
    return children

async def coro_children() -> 'List[Awaitable[None]]':
    await asyncio.sleep(0.1)  # some preparation work
    children = [child() for _ in range(10)]
    return children

All of encapsulated, task_children and coro_children in some way encode that there are sub-tasks. This allows the caller to run them in such a way that the actual goal is "done" reliably. However, each variant differs in how much it does by itself and how much the caller has to do:

The encapsulated is the "heaviest" variant: all children are run in Tasks and there is an additional gather. However, the caller is not exposed to any of this:
```
await encapsulated()
```
This guarantees that the functionality works as intended, and its implementation can freely be changed.
The task_children is the intermediate variant: all children are run in Tasks. The caller can decide if and how to wait for completion:
```
tasks = await task_children()
await asyncio.gather(*tasks)  # can add other tasks here as well
```
This guarantees that the functionality starts as intended. Its completion relies on the caller having some knowledge, though.
The coro_children is the "lightest" variant: nothing of the children is actually run. The caller is responsible for the entire lifetime:
```
tasks = await coro_children()
# children don't actually run yet!
await asyncio.gather(*tasks)  # can add other tasks here as well
```
This completely relies on the caller to start and wait for the sub-tasks.

Using the encapsulated pattern is a safe default – it ensures that the coroutine "just works". Notably, a coroutine using an internal gather still appears like any other coroutine.

`gather` speed?

The gather utility a) ensures that its arguments are run as Tasks and b) provides a Future that triggers once the tasks are done. Since gather is usually used when one would run the arguments as Tasks anyway, there is no additional overhead from this; likewise, these are regular Tasks and have the same performance/priority characteristics¹ as everything else.

The only overhead is from the wrapping Future; this takes care of bookkeeping (ensuring the arguments are tasks) and then only waits, i.e. does nothing. On my machine, measuring the overhead shows that it takes on average about twice as long as running a no-op Task. This by itself should already be negligible for any real-world task.

In addition, the pattern of gathering child tasks inherently means that there is a tree of gather nodes. Thus the number of gather nodes is usually much lower than the number of tasks. For example, for the case of 10 tasks per gather, a total of only 11 gathers is needed to handle a total of 100 tasks.

master_method                                                  0

sub_method         0          1          2          3          4          5 ...

my_task       0123456789 0123456789 0123456789 0123456789 0123456789 0123456789 ...

¹Which is to say, none. asyncio currently has no concept of Task priorities.

thanks,! So performance wise if I run all tasks in the master_method or if I run them with this nested structure they should be completed in the same time except for the extra time that collecting the gather itself as you showed in your test? Can you explain a bit more what causes this extra time delay when you wrap in the extra gather() if it is easy? Otherwise it is clear thanks. — KZiovas, Oct 27 '21 at 12:20
@KZiovas Yes, the performance should be very similar. Basically, the extra overhead is from creating the Future and having it wait for the tasks. That's extra code that needs to run, therefore it has *some* overhead compared to not running the code. But since most of what it does is waiting (i.e. doing nothing), it's not severe. — MisterMiyagi, Oct 27 '21 at 12:27

Using nested asyncio.gather() inside another asyncio.gather()

1 Answers1

Why `gather`?

`gather` speed?

Linked

Using nested asyncio.gather() inside another asyncio.gather()

1 Answers1

Why gather?

gather speed?

Linked

Why `gather`?

`gather` speed?