6

From the well-known C++ coroutine library (search "Don't allow any use of co_await inside the generator coroutine." in the source file generator.hpp), and from my own experiments, I know that a coroutine using co_yield cannot use co_await meanwhile.

Since a generator using co_yield must be synchronous, then, what's the advantage of using co_yield over a simple stateful lambda?

For example:

#include <iostream>

generator<int> g()
{
    for (auto i = 0; i < 9; ++i)
    {
        co_yield i;
    }
}

int main()
{
    auto fn_gen = [i = 0] mutable { return i++; };

    // Lambda way
    for (auto i = 0; i < 9; ++i)
    {
        std::cout << fn_gen() << std::endl;
    }

    // co_yield way
    for (auto i : g())
    {
        std::cout << i << std::endl;
    }
}

What's the special value of co_yield in contrast to a simple stateful lambda in C++20?

Please See the Updated MWE: https://godbolt.org/z/x1Yoen7Ys

In the updated example, the output is totally unexpected when using co_await and co_yield in the same coroutine.

xmllmx
  • 39,765
  • 26
  • 162
  • 323
  • 3
    You totally can use `co_await` inside functions that also use `co_yield`. There are even [legitimate use cases for it](https://stackoverflow.com/a/64083986/734069). – Nicol Bolas Dec 30 '21 at 19:02
  • 1
    Short answer: it depends. For a simple generator, a state full lambda is probably preferable. But coroutine generators will probably be simpler in more complex cases, where the state is not as trivial as a counter. For example, a recursive iterator on files in a directory. – prapin Dec 30 '21 at 19:03
  • I tested on clang-13, using `co_await` and `co_yield` will result in an unexpected `resume` order. Maybe it's a bug in the coroutines implementation. – xmllmx Dec 30 '21 at 19:05
  • @xmllmx: You'd have to show the total code being employed in your test case, including the coroutine machinery types. Obviously, cppcoro's `generator` type has an explicit mechanism to hose attempts to `co_await` inside of a generator. – Nicol Bolas Dec 30 '21 at 19:07
  • A stateful lambda can do the same thing without more complexity. @prapin – xmllmx Dec 30 '21 at 19:07
  • @xmllmx: Elegance is in the eye of the beholder. Also, it depends entirely on how complex the function is and whatever state it needs to track. – Nicol Bolas Dec 30 '21 at 19:07
  • The coroutine could have more code after the loop, to yield more values in several different ways. And automatically keep track of how far it has come. – BoP Dec 30 '21 at 19:22
  • 2
    Well, for one thing, those two examples aren't equivalent. In lots of ways. The generator gives you the numbers from 0 to 8. The lambda gives you the numbers from 0 to ... You can call `g()` twice and the second time it starts at 0, you'd have to make a copy of `fn_gen` before you called it at all to get that behavior. etc. – Barry Dec 30 '21 at 19:23

2 Answers2

7

For trivial generators with minimal internal state and code, a small functor or lambda is fine. But as your generator code becomes more complex and requires more state, it becomes less fine. You have to stick more members in your functor type or your lambda specifier. You have bigger and bigger code inside of the function. Etc.

At the most extreme, a co_yield-based generator can hide all of its implementation details from the outside world, simply by putting its definition in a .cpp file. A stateful functor cannot hide its internal state, as its state are members of the type, which the outside world must see. The only way to avoid that is through type-erasure, such as with something like std::function. At which point, you've gained basically nothing over just using co_yield.

Also, co_await can be used with co_yield. Cppcoro's generator type explicitly hoses it, but cppcoro isn't C++20. You can write whatever generator you want, and that generator can support uses of co_await for specific purposes.

Indeed, you can make asynchronous generators, where sometimes you can yield a value immediately, and sometimes you can schedule the availability of a value with some asynchronous process. The code invoking your async generator can co_await on it to extract values from it, rather than treating it like a functor or an iterator pair.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Please see my updated example in the original question illustrating the unexpected output when using both `co_await` and `co_yield` in the same coroutine. – xmllmx Dec 30 '21 at 19:36
  • @xmllmx: You did it wrong. I don't see any `await_transform` usage in your promise type, nor do I see any affordance in your generator interface for a generator that has yet to return a value. – Nicol Bolas Dec 30 '21 at 20:11
  • await_transform is optional. If no co_await, the output is as expected. – xmllmx Dec 30 '21 at 20:13
  • 1
    @xmllmx: It is optional in the sense that you may not need to use it to achieve some particular effect that you want to achieve. And to be honest, I don't know if you need it for this. But I don't see *anything* in your promise or future types that does anything that would be needed to allow for `co_await` usage in your generator. Where does `Awaiter` schedule the resumption of the coroutine that's waiting on it? Invoking a generator may or may not return a value, but you couldn't tell from the interface code. You have to use the coroutine machinery correctly to get the effect you want. – Nicol Bolas Dec 30 '21 at 20:17
  • Thank you very much for the hint of await_transform, I’ll reconsider the MWE. – xmllmx Dec 30 '21 at 20:25
0

A stateful lambda or a custom functor is almost always the better choice imho. In fact you can get more efficient coroutines by just using lambdas. Compare this:

Demo

#include <cstdio>
#include <cstdint>


int main() {

    enum class cont_point : uint8_t {
        init,
        first,
        second,
        third,
        end,
    };

    auto lambda = [cp = cont_point::init]() mutable -> void {
        switch(cp) {
            case cont_point::init:
                printf("init\n");
                cp = cont_point::first;
                break;
            case cont_point::first:
                printf("first\n");
                cp = cont_point::second;
                break;
            case cont_point::second:
                printf("second\n");
                cp = cont_point::third;
                break;
            case cont_point::third:
                printf("third\n");
                cp = cont_point::end;
                break;
            default:
                return ;
        }
    };
    
    lambda();
    lambda();
    lambda();
    lambda();
}

Yields:

init
first
second
third

If you check the assembly you will see that the code is optimized to perfection which gives you a hint about how efficient compilers are in optimizing lambdas. The same is not true for coroutines (not yet at least).

But

Coroutines offer one very interesting niche case which no other language construct can fill, namely they solve the cactus stack problem. The cactus stack problem basically denotes the problem of code forks to run on the same stack - this is not possible so a seperate stack must be generated. If the executing thread on that stack then forks again, there must be another stack and so on. And what's even worse is that nobody knows how big these stacks are going to be.

C++20 coroutines are stackless which conversely means they do use a stack but not for the stateful data, only data that does not traverse the awaitable points will be thrown on the executing task's stack, so it can safely be deleted during stack unwinding while all stateful data remains on something called a coroutine frame, that typically (and unfortunately even in simple-to-optimise cases) rests on the heap (allocated via operator new). This decision of what to put inside the coroutine frame and what to put on the callstack as execution goes on is done by the compiler in a process called coroutine transformation. It is this process that makes coroutines uniquely able to solve the cactus stack problem as follows:

Every newly allocated coroutine instance will keep a predefined amount of space on the heap, comparable to an object with its data fields. When the coroutine is executed additional data is put on the stack of whatever task is executing the continuation of the coroutine. This way, the stack can grow dynamically while and we don't have the problem of many stack overflows (like is the case for stackful coroutines) but we only have to make sure all threads have sufficient stackspace available to them as we usually do.

glades
  • 3,778
  • 1
  • 12
  • 34