1

A question came up recently that was a learning experience for me. Something like the following was giving a "use of undefined" error:

int a;
for(int i = 0; i < 1; i++)
  a = 2;
a /= 2;

It's a contrived example and doesn't make sense but it gives the required error. I was aware that it's perfectly OK to use inner scopes to set variable values so long as the compiler can work out that all flows result in a definite assignment:

int a;
if(someboolean)
  a=2;
else
  a=4;

But I hadn't formerly realised that inner scoped blocks that are contingent on some variable value will error, even when there is no perceptible way the variable could be "wrong":

int a;
bool alwaysTrue = true;
if(alwaysTrue)
  a = 2;
a /= 2; //error

Resolving this with a compile time constant is fine:

int a;
if(true)
  a = 2;
a /= 2; //fine

I wondered if it might be because the compiler was removing the if entirely, but a more involved statement is also fine:

int a;
for(int i = 0; true; i++){
  a = 2;
  if(i >= 10)
    break;
}
a /= 2; //fine

Perhaps this is being inlined/optimised too, but the essence of my question is, for that first simple loop for(int i = 0; i < 1; i++) is there actually any conceivable way that the loop will NOT run and hence the "variable a may be unassigned" is a valid assertion, or is the static flow analysis just running on a simple "any conditionally controlled code block that sets variable a is automatically deemed to have a situation where it might not run and we short cut straight to showing an error on the subsequent use" rule?

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • Questions having the form "why does the compiler do X" (or typically more correctly, "why does the language specification require X") are nearly always, at least to some extent, "primarily opinion-based". In very few cases, someone who actually worked on the language (like Eric Lippert, who does often answer this type of question) may show up and address it, but such questions still tend to attract a large number of speculative and less-than-useful answers. – Peter Duniho Aug 04 '19 at 06:16
  • Surely though this question has a definite answer in the language spec? (I know you pointed me there but it's hard going to find the required bit) – Caius Jard Aug 04 '19 at 06:19
  • 3
    Your question specifically? No. The specification dictates how the language works, and what compilers must do. It doesn't explain _why_ the language is the way it is, nor what the language designers _could_ have done. – Peter Duniho Aug 04 '19 at 06:20
  • Thanks for the feedback peter, I've taken a look and I think the wording of the question at the end is ok, but the title of the question was poor(now edited). I'm keen to know if there is a situation where the loop won't run (multithreaded reflection reaches in at just the right moment and changes i to 2000?) or if it's a simplistic rule in the static flow analyser – Caius Jard Aug 04 '19 at 06:27
  • @peterduniho (also interested to know because you'd said "loops are [deemed incapable of definitely assigning a variable" (and matt had asserted that no inner scopes blocks are capable of definitely assigning) - but if we make the loop-shall-run test condition a constant `true` then it's fine - is it because the loop is being optimised away and your rule is accurate, or is it that some loops can and some loops can't and your rule was an over simplification) – Caius Jard Aug 04 '19 at 06:33
  • 1
    My previous comment about loops was, as you've guessed, an oversimplification. It doesn't relate to how/whether the compiler optimizes loops (indeed, loops with `true` control expressions can't be optimized away!), but rather how the language specification requires definite assignment to be determined. I've elaborated in my answer below. – Peter Duniho Aug 04 '19 at 07:01

1 Answers1

2

is there actually any conceivable way that the loop will NOT run and hence the "variable a may be unassigned" is a valid assertion

In your example, assuming a is a local variable, the loop must run. Local variables cannot be modified except in the thread where they are instantiated. It's just that the compiler isn't required to determine that's the case, nor will it.

I will point out that your final example isn't a case of optimization. It works just like the while (true) case which you've already established allows the compiler to see the variable as definitely assigned.

In terms of "why", there are two ways to interpret that question. The easy way is "why does the compiler do this?" and the answer is "because the language specification says so".

Language specifications aren't always the easiest thing to read, and the rules of definite assignment are a particularly stark example of that statement, but you can find the answers to this first interpretation of "why" here: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/variables#precise-rules-for-determining-definite-assignment

You'll note that in general, the only way a loop control structure will lead to definite assignment is if the expression that controls the loop itself is participating in definite assignment. This hits the "Definitely assigned after true expression" and "Definitely assigned after false expression" sub-states scenario. You'll also note that this part of the specification doesn't apply to your examples.

So you're left with the main point of the definite assignment rules for loops (there are other qualifications, but none apply in the simple cases):

v has the same definite assignment state at the beginning of expr as at the beginning of stmt.

I.e. whatever v was before the loop, it's the same after. The loop itself is ignored.

So, if loops don't generally create definite assignment, why do loops controlled by literal values (i.e. "constant expressions") allow for definite assignment? This is because of a different part of the specification, referenced by the rules for definite assignment: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/statements#end-points-and-reachability

The flow analysis takes into account the values of constant expressions (Constant expressions) that control the behavior of statements, but the possible values of non-constant expressions are not considered.

The flow analysis is done to determine reachability for a statement or loop end point, but this becomes directly applicable for definite assignment:

  • The definite assignment state of v at the end point of a block, checked, unchecked, if, while, do, for, foreach, lock, using, or switch statement is determined by checking the definite assignment state of v on all control flow transfers that target the end point of that statement. If v is definitely assigned on all such control flow transfers, then v is definitely assigned at the end point of the statement. Otherwise; v is not definitely assigned at the end point of the statement. The set of possible control flow transfers is determined in the same way as for checking statement reachability [emphasis mine]

In other words, the compiler will apply the same analysis it uses for statement reachability when determining definite assignment. Hence, loops controlled by constant expressions get analyzed while those that are not, don't.

The harder way to interpret "why" is "why did the language authors write the specification this way?" That's where you start to get into opinion-based answers, unless you're actually talking to one of the language authors (who may in fact at some point post an answer, so…not remotely out of the realm of possibility :) ).

But, it seems to me that there are a couple of ways to address that question:

  • They probably wrote the specification that way because, as complicated as the definite assignment rules are now, they would have been even more complicated if the the compiler were required to do static flow analysis on variables, never mind how much more complicated actually writing the compiler would have been.
  • More theoretically, it comes down to the Halting Problem. I.e. as soon as you start asking the compiler to do non-trivial flow analysis, you open the door for someone to write some C# code the effectively makes the compiler determine whether the C# code can halt or not. Since that's impossible to do in all cases, it's probably a bad idea to include that requirement in the specification.

Dealing with constant expressions, which not only can but must be computed at compile-time is one thing. Making the compiler essentially run your program just to compile it, is a whole 'nother ball o' wax.

Peter Duniho
  • 68,759
  • 7
  • 102
  • 136