-1

I am trying to understand how the iterators work internally, to mitigate some concerns I have about thread-safety. Lets consider for example the following simple iterator:

using System.Collections.Generic;

public class MyClass
{
    public static IEnumerable<int> MyMethod()
    {
        yield return 10;
        yield return 20;
        yield return 30;
    }
}

I can see the compiler-generated state machine that is created behind the scenes, after copy-pasting this code to SharpLab.io. It is a class that implements the interfaces IEnumerable<int> and IEnumerator<int>, and contains the MoveNext method below:

private bool MoveNext()
{
    switch (<>1__state)
    {
        default:
            return false;
        case 0:
            <>1__state = -1;
            <>2__current = 10;
            <>1__state = 1;
            return true;
        case 1:
            <>1__state = -1;
            <>2__current = 20;
            <>1__state = 2;
            return true;
        case 2:
            <>1__state = -1;
            <>2__current = 30;
            <>1__state = 3;
            return true;
        case 3:
            <>1__state = -1;
            return false;
    }
}

The identifiers <>1__state and <>2__current are private fields of this class:

private int <>1__state;
private int <>2__current;

I noticed a pattern in this code. At first the value of the <>1__state field is restored to -1, then the <>2__current is assigned to the next iteration value, then the <>1__state is advanced to the next state. My question is: what is the purpose of the <>1__state = -1; line? I compiled this code (after painfully renaming all the illegal identifiers) and confirmed that this line can be commented-out without affecting the functionality of the class. I don't believe that the C# compiler team just forgot this seemingly purposeless piece of code hanging around there. Surely there must be a purpose for its existence, and I would like to know what this purpose is.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • What is `<>` here ? –  Oct 05 '19 at 09:21
  • @OlivierRogier it is a prefix that seems that it's valid for compiler generated code. It is certainly invalid according to the C# specifications, so I had to rename the variables to compile this code. – Theodor Zoulias Oct 05 '19 at 09:24
  • I don't understand because `<>` is the diamond operator: it is to allow true generic polymorphism on open types and it not available in C# yet, as I know. –  Oct 05 '19 at 09:33
  • @OlivierRogier I don't think that the angle brackets carry any significant meaning here. Have been probably added with the intention to make the identifier invalid, to avoid clashes with user code. – Theodor Zoulias Oct 05 '19 at 09:37
  • The purpose of the `this.__state = -1;` is fairly easy. It's how a state machine works. Because the state machine doesn't know what value you're requesting and potentially your user code might be a very slow webservice request, it sets state to -1 to say "I'm busy getting the next value". – Dennis VW Oct 05 '19 at 09:38
  • @Dennis1679 you have a point, but this code is linear. No one can interrupt this code and ask the iterator if it's busy or not. – Theodor Zoulias Oct 05 '19 at 09:43
  • @TheodorZoulias the compiler doesn't know that. The `MyMethod()` is just sitting there in a class. Any thread could be calling it from anywhere. You can call it 6 times in a row before you even iterate over the results. And the enumerator is smart. It will reuse itself when it can. But to do so, it needs to know what state it's in. – Dennis VW Oct 05 '19 at 10:13
  • @Dennis1679 so you say that the iterators are thread-safe? That they can be enumerated either by a single thread, or concurrently by multiple threads, and produce the same values in both cases? – Theodor Zoulias Oct 05 '19 at 10:26
  • Or maybe they are not thread-safe, but the line `this.__state = -1;` makes them a bit less thread unsafe? Btw this is my current theory about it. – Theodor Zoulias Oct 05 '19 at 10:29
  • @TheodorZoulias no it's not thread-safe. The enumerator does not have exclusive access to the collection; therefore, enumerating through a collection is intrinsically not a thread-safe procedure. – Dennis VW Oct 05 '19 at 10:55
  • What `GetEnumerator()` call does is it checks if `if (this.<>1__state == -2 && this.<>l__initialThreadId == Environment.CurrentManagedThreadId)` and returns the enumerator with state 0, or if it's not true then it creates a new Enumerator and sets the state to 0. So in the end, you will have an enumerator with state = 0. The thread stuff is just for performance reasons because if you call this method 6 times on a single thread you won't need 6 separate Enumerators, you just need to return the single one. – Dennis VW Oct 05 '19 at 10:56
  • @Dennis1679 if you are using a single enumerator per thread, then the `__state = -1` line is irrelevant, because no one will interrupt this enumerator. On the other hand if a single enumerator is shared by multiple threads, then the `__state = -1` line has significance. But why bother adding a line that has no effect (actually slows down) the single-thread case, that has some effect in the multi-threaded case, without actually making it thread safe? – Theodor Zoulias Oct 05 '19 at 11:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/200437/discussion-between-dennis1679-and-theodor-zoulias). – Dennis VW Oct 05 '19 at 11:56

1 Answers1

2

There isn't one definitive answer as to why you need a state variable and set it to -1 each time you enter your switch statement. But I can think of one example where you would really need the variable.

Like I said in the comment section, the compiler isn't aware and doesn't really care what <>2__current does.

It might be a long-running web request to download a file. It might be the result of a calculation or it might just be an integer as in your example. But here lies the problem, because the compiler isn't aware of what your code does, it might throw an exception. Let's look at an example of what would happen if you omitted the _state variable and you would run into an exception trying to download something.

1) MoveNext is called.
2) this.<>2_current = WebRequest.GetFileAsync() throws HttpRequestException.
3) The exception is caught somewhere and the execution of the program is resumed.
4) The caller invokes MoveNext method.
5) this.<>2_current = WebRequest.GetFileAsync() throws HttpRequestException

So in this case, we would be stuck in a loop because the state would be changed only after successfully downloading that data.

When we introduce the _state variable the result looks a lot different.

1) MoveNext is called.
2) this.<>2_current = WebRequest.GetFileAsync() throws HttpRequestException.
3) The exception is caught somewhere and execution of the program is resumed.
4) The caller invokes MoveNext method.

5) Since there’s no switch case for -1, the default block is reached which informs about the end of a sequence.
Dennis VW
  • 2,977
  • 1
  • 15
  • 36
  • Interesting. So the purpose of this assignment is to disallow the reuse of an enumerator that has encountered an exception. I wonder if all enumerators behave the same way (for example the LINQ enumerators), or it's just the compiler-generated ones. – Theodor Zoulias Oct 05 '19 at 13:08
  • My tests have shown a difference in behavior. I made a new question about it [here](https://stackoverflow.com/questions/58257033/why-iterators-behave-differently-on-exception-that-linq-enumerables). – Theodor Zoulias Oct 06 '19 at 12:03