Will a properly implemented recursive lazy iterator function never stack overflow?

Question

tl;dr;

In C#, do you have guarantees that a lazy iterator function that calls nothing but itself and does have a valid recursion exit condition will not cause a stack overflow?

Detailed question:

I know that as a rule you don't get guarantees of the Tail Call Optimization (TCO) instruction being generated by the C# compiler (or the JIT), so while you may get TCO, there are no guarantees.

Given this recognition of TCO, I'm wondering if lazy iterator functions (using yield return etc) because of their nature as a coroutine - does each tail call in one even take up stack space? My intuition of coroutines because of their re-entrancy is that each tail call is optimized by default as the ability to jump out of the function and into the next one from the parent's frame instead of creating a new frame seems natural.

Is this behaviour in C#, or do C# iterator functions' recursive calls create a new frame from the current rather than popping out to the parent frame and re-entering with the new parameters?

Example:

public static IEnumerable<IEnumerable<T>> GeneratePermutations<T>(this IEnumerable<T> choices, int numberToChoose)
{
    if (numberToChoose == 1)
    {
        foreach (var choice in choices)
            yield return new T[] { choice };
        yield break;
    }

    var subPermutations = choices.SelectMany(choice =>
        choices.Where(elem => !EqualityComparer<T>.Default.Equals(elem, choice))
            .GeneratePermutations(numberToChoose - 1)
            .Select(permutation => (new T[] { choice }).Concat(permutation)));
    foreach (var perm in subPermutations)
        yield return perm;
}

My intuition is based off in the above example subPermutations is simply a heaped computation, it seems upon call to iterate it, it can know it's a heaped computation (it is a part of the functions sig that it's an iterator function), and therefore immediately jump out of it's current frame and expanding the heaped computation into a new frame - costing no extra stack space over what was there before the recursive call was attempted...

This intuition may be totally unfounded...

Would you care to provide a sample iterator block that you feel the compiler ought to be able to optimize? — Servy, Aug 14 '14 at 19:06
@Servy: I don't think `yield return` is recursive anyway, so TCO is not relevant here. It's simply a state machine. — Robert Harvey, Aug 14 '14 at 19:08
@RobertHarvey iterator blocks are not *inherently* recursive. You can write one that is though. Consider this method `public static IEnumerable Foo() { yield return 1; foreach (var n in Foo()) yield return n; }`. If you try to iterate that sequence you'll eventually blow out the stack, and I don't see any particularly good ways of trying to leverage TCO. — Servy, Aug 14 '14 at 19:09
If we had a `yield foreach` statement *then* I could potentially foresee being able to leverage TCO. The only real way to leverage TCO *without* that is to try to recognize the pattern of `foreach`-ing over the recursive call and yielding all of the items exactly, without doing anything else, and that seems like a pretty sketchy idea. — Servy, Aug 14 '14 at 19:13
@Servy good point, I forgot to mention I was referring only to recursive iterators. It just seems to me that because yield returns are only computations until each element is asked for, and upon each exit from them it becomes a stateful computation again not taking up stack space - technically that computation should be decomposable so each frame upon tail call could flatten back to an in-memory computation in the parents frame while executing the tail call as re-entrancy into the heaped computation, expanding it into a stack frame only until it hits a yield... — Jimmy Hoffa, Aug 14 '14 at 19:17
@JimmyHoffa I really think you need an example here. Provide a sample iterator block, show how you're using it, and state what you think the stack ought to look like with/without TCO. As it is, it's not clear what exactly you're referring to. — Servy, Aug 14 '14 at 19:21
Related: http://programmers.stackexchange.com/questions/216750/is-it-possible-to-implement-an-infinite-ienumerable-without-using-yield-with-onl — Robert Harvey, Aug 14 '14 at 19:22
@JimmyHoffa I can't *possibly* imagine how TCO could be applied to that method. You're manipulating the results of the recursive call before propogating those results back to the caller. That means no TCO (removing the complexity of the while iterator block thing for a second). TCO only works when the result of the current method becomes identical to the results of the recursive call. That's not going on here. — Servy, Aug 14 '14 at 19:44
@Servy TCO is possible whenever the recursive call isn't a parameter of something, so the parameters to the recursive call itself can be captured (just like frame is captured and transferred off stack to the heap each time it re-exits, and is expanded back from heap to stack everytime it re-enters). I'm just wondering if upon a tail call it puts the frame to heap before re-entering the next frame, or leaves it on stack, both are possible as iterators allow it, just a question of whether the implementation takes advantage of that allowance. — Jimmy Hoffa, Aug 14 '14 at 19:48
@JimmyHoffa That's just it. In your example the recursive call *is* a parameter to something. It's a parameter to `Select`, which itself is a parameter to `Concat`. If you weren't calling either operation on the results of the recursive call then there'd be a discussion to be had. — Servy, Aug 14 '14 at 19:53
@Servy yeah - bad example I chose, you're right if I blew out the yields to make it a normal function it's absolutely not able to be TCO'd. — Jimmy Hoffa, Aug 14 '14 at 19:56

score 11 · Accepted Answer · edited Aug 14 '14 at 20:09

So, let's open with an example method, so that we have something to reference:

public static IEnumerable<int> Foo()
{
    yield return 1;
    foreach (var n in Foo())
        yield return n;
}

Here's our recursive iterator block. I just want to take a moment to call out a few properties of this method that may (or may not) end up being relevant.

There is a recursive call, but the recursive call is after a yield.
When we do reach our recursive call, the only thing we do after that point is yield all of its results. There is no projection on each item, no finally block, nothing after those yields, etc.

So, what happens when some code goes and writes the following?

foreach(var n in Foo())
    Console.WriteLine(n);

Well, the first thing that happens when we reach this statement is to evaluate Foo() to a value. In this case, this creates the state machine that represents the generator of the sequence. We've not actually executed any of the code in the method body though.

Next, we call MoveNext. We hit our first yield block, yield a value, and print it out.

After that, the outer-most level calls MoveNext again. Here our state machine's MoveNext method reaches it's own foreach block. It will, like the Main method, evaluate Foo() to a value, creating a second state machine. It will then immediately call MoveNext on that state machine. That second state machine will reach it's first yield, it will yield the value to the first iterator, which will yield that back up to the main method, that will print it.

Then the main method calls MoveNext again. The first iterator asks the second iterator for it's second item, the second iterator reaches it's foreach method, creates a third iterator, and gets a value from it. The value gets passed all the way up.

As we can see here each time we as our top level iterator for another item the stack is always one level deeper than before. Despite the fact that we're using state machines, and that creating the iterators doesn't consume a lot of stack space, getting the next item in the sequence will consume more and more stack space, until we run out.

When running the code we can see that things work out exactly as described here, and the stack will overflow.

So, how could this possibly be optimized?

Well, what we're hoping to do here is for that top level iterator to realize that when it gets to the foreach that "from now on, the rest of the items in my sequence are identical to all of the items in the recursive call". This does sound a lot like a typical TCO situation.

So at this point we have two issues to solve. First, if we recognize that we're in this situation, can we actually avoid the creation of additional state machines, and thus the continually increasing stack space. It wouldn't be all that easy, likely not quite as easy as traditional non-iterator block TCO. You'd need to set all of the instance fields of the state machine to whatever the instance fields would be of the state machine that would be created if you had called Foo. I'm just going to wave my hands at this point and say that this sounds possible, but not exactly super each.

Then we have the other problem. How can we recognize that we're actually in this position where TCO is valid? We need to be recursively calling ourselves, we need to be doing nothing with that method call other than iterating the whole thing and yielding each item exactly as it stands, we need to not be in a try or using block (else the finally block would be lost), and there can't be any methods after that iteration.

Now, if there were a yield foreach operator then this wouldn't be so bad. You'd just set up the rule that if the very last statement in the iterator block is a yield foreach operator with a recursive call to the method at the very end, apply TCO. Sadly, in C# (unlike some other .NET languages) we have no yield foreach operator. We need to type out the whole foreach operator, while also not doing anything other than yielding the item exactly as it stands. That seems...a bit awkward.

To recap:

Is it possible for the compiler to use Tail Call Optimization for recursive iterator blocks?
- Most likely.
Is it done by the compiler ever?
- It doesn't appear so.
Would it be particularly feasible to add this support into the compiler?
- Probably not.

'Foo() to a value, creating a second state machine. It will then immediately call MoveNext on that state machine' -> I am wondering if before it "immediately" calls MoveNext, because it's operating *in* a state machine, it knows well enough to just heap the current frame, with the next layers State Machine in hand to call as it has the statefulness necessary for the next layers sequence generation - the results of which could be filled in as you re-enter it's parent frame after it's finished... Only possible if parent frame is re-entrant (such as a recursive iterator's) — Jimmy Hoffa, Aug 14 '14 at 19:53
@JimmyHoffa That assumes that it knows that the only that is going to be done with the result of that call to `MoveNext` is for that iterator's value to be yielded to the current method's parent. When creating the iterator or getting the next value it doesn't necessarily know if the only thing that will be done with that value is to yield it to its caller. Something may need to be done first. Now with a `yield foreach` you do know that, every time. When you don't have that operator it's trickier for the compiler to verify that the optimization is valid. — Servy, Aug 14 '14 at 19:57
It's not technically TCO I'm talking about. I'm referring to the ability for the generators to be suspended at *any time* and placed to take *heap* space instead of *stack* space, is that fact taken advantage of to suspend the current generator upon a call to itself. It *can* do that, these functions are arbitrarily stoppable/resumable... It's an implementation detail whether or not it *does* do that. Technically it's a type of TCO - though not the standard tail-call turned into a loop technique. Imagine the frames being popped between heap and stack like towers of hanoi to reach the top frame — Jimmy Hoffa, Aug 14 '14 at 20:00
@JimmyHoffa No, that's not what's going on here. It's *not* possible to arbitrarily take a method off of the stack throw it onto the heap, and clear out the stack. There's just some slick moves going on behind the curtain to make you think that that's going on. Take a look at some of the link's Robert provided earlier. The iterator block, when compiled, will have all of the code stripped out, a new class will be created implementing `IEnumerable`, that method's `MoveNext` method will contain the method you had in your iterator block, and all `yield` statements will have a label, — Servy, Aug 14 '14 at 20:05
@JimmyHoffa continued: at the start of the method it'll have a `goto` to go to the label that was reached by the previous call to `MoveNext` (which it'll know because a variable will be set just before returning.) where the method can continue on. All of the code between those labels (the yields) is just executed on the stack of the `MoveNext` method. In this case, that code between yields is code to create a new state machine and call its `MoveNext` method. — Servy, Aug 14 '14 at 20:07
Your implementation details are very appreciated! I've always figured it was *something* like that, though for some reason I always thought it was likely more run-time proxy generation shit rather than compile time. This makes a lot of sense now. I'm not convinced it's *impossible*, just it would be a pretty far rift from what the standard code generation does to make them pause and capture before tail calls with a label immediately afterwards rather than only at yields.. — Jimmy Hoffa, Aug 14 '14 at 20:14
@JimmyHoffa I explained how it might be possible, and it would likely not be as you just described. Realistically you'd be doing it any time you had a `yield foreach` statement, as that is the exact case where the optimization can be applied, but detecting situations where the author clearly would have used a `yield foreach` given that C# doesn't have one is at least a little bit tricky for both the compiler and the user of the language. So is it possible, yes, but the problem is people would just need to know the "magic words" so to speak to ensure that it actually worked. — Servy, Aug 14 '14 at 20:19

Will a properly implemented recursive lazy iterator function never stack overflow?

1 Answers1

To recap: