Forward: When this post says just "coroutines", I am referring to the concept of a coroutine, not the specific C++20 feature. When talking about this feature, I will refer to it as "co_await
" or "co_await coroutines".
On dynamic allocation
Cppreference sometimes uses looser terminology than the standard. co_await
as a feature "requires" dynamic allocation; whether this allocation comes from the heap or from a static block of memory or whatever is a matter for the provider of the allocation. Such allocations can be elided in arbitrary circumstances, but since the standard does not spell them out, you still have to assume that any co_await coroutine may dynamically allocate memory.
co_await coroutines do have mechanisms for users to provide allocation for the coroutine's state. So you can substitute the heap/free store allocation for any particular pool of memory you prefer.
co_await
as a feature is well-designed to remove verbosity from the point of use for any co_await
-able objects and functionality. The co_await
machinery is incredibly complicated and intricate, with lots of interactions between objects of several types. But at the suspend/resume point, it always looks like co_await <some expression>
. Adding allocator support to your awaitable objects and promises requires some verbosity, but that verbosity lives outside of the place where those things get used.
Using alloca
for a coroutine would be... highly inappropriate for most uses of co_await
. While the discussion around this feature tries to hide it, the fact of the matter is that co_await
as a feature is designed for asynchronous use. That's its intended purpose: to halt the execution of a function and schedule that function's resumption on potentially another thread, then shepherding any eventually generated value to some receiving code which may be somewhat distant from the code which invoked the coroutine.
alloca
is not appropriate for that particular use case, since the caller of the coroutine is allowed/encouraged to go do whatever so that the value can be generated by some other thread. The space allocated by alloca
would therefore no longer exist, and that is kind of bad for the coroutine that lives in it.
Also note that allocation performance in such a scenario will generally be dwarfed by other considerations: thread scheduling, mutexes, and other things will often be needed to properly schedule the coroutine's resumption, not to mention the time it takes to get the value from whatever asynchronous process is providing it. So the fact that a dynamic allocation is needed is not really a substantial consideration in this case.
Now, there are circumstances where in-situ allocation would be appropriate. Generator use cases are for when you want to essentially pause a function and return a value, then pick up where the function left off and potentially return a new value. In these scenarios, the stack for the function which invokes the coroutine will certainly still be around.
co_await
supports such scenarios (though co_yield
), but it does so in a less-than-optimal way, at least in terms of the standard. Because the feature is designed for up-and-out suspension, turning it into a suspend-down coroutine has the effect of having this dynamic allocation that doesn't need to be dynamic.
This is why the standard does not require dynamic allocation; if a compiler is smart enough to detect a generator pattern of usage, then it can remove the dynamic allocation and just allocate the space on the local stack. But again, this is what a compiler can do, not must do.
In this case, alloca
-based allocation would be appropriate.
How it got into the standard
The short version is that it got into the standard because the people behind it put in the work, and the people behind the alternatives did not.
Any coroutine idea is complicated, and there will always be questions about implementability with regard to them. For example, the "resumeable functions" proposals looked great, and I would have loved to see it in the standard. But nobody actually implemented it in a compiler. So nobody could prove that it was actually a thing you could do. Oh sure, it sounds implementable, but that doesn't mean it is implementable.
Remember what happened the last time "sounds implementable" was used as the basis for adopting a feature.
You don't want to standardize something if you don't know it can be implemented. And you don't want to standadize something if you don't know if it actually solves the intended problem.
Gor Nishanov and his team at Microsoft put in the work to implement co_await
. They did this for years, refining their implementation and the like. Other people used their implementation in actual production code and seemed quite satisfied with its functionality. Clang even implemented it. As much as I personally don't like it, it is undeniable that co_await
is a mature feature.
By contrast, the "core coroutines" alternatives that were brought up a year ago as competing ideas with co_await
failed to gain traction in part because they were difficult to implement. That's why co_await
was adopted: because it was a proven, mature, and sound tool that people wanted and had the demonstrated ability to improve their code.
co_await
is not for everyone. Personally, I will likely not use it much, as fibers work much better for my use cases. But it is very good for its specific use case: up-and-out suspension.