Bug with detection of unassigned local variables (when dynamic variables affect code flow prediction)

Question

The Documentation implies that out parameters do not need to be initialized (only declared) before they are sent to the function. However, this code:

class Program
{
    static void Main()
    {
        dynamic p = "";
        string s;
        if (p != null && T(out s))
            System.Console.WriteLine(s);
    }

    static bool T(out string s)
    {
        s = "";
        return true;
    }
}

Gives the build error:

Use of unassigned local variable 's'

only when p is dynamic. If p is typed as string or object, no error is produced.

Method T is required to set the variables before returning, so this error seems like hogwash to me (Note that even with a short-circuting &&, the second statement has to execute in order for the "then" block to execute).

Note: you can also download this repro repo to reproduce.

So, is this a legitimate bug (I'm on C# 7.0)? How should I handle this?

Are you using those variables in the else block? Because I did not get the error with the code provided, and the only explanation I have is that the `TryGetURLParams()` is not executed due to `page != null` being false. — clcto, Sep 27 '18 at 17:07
I've tried your code in several different versions of C# and I cannot reproduce the problem. Please provide a small **complete** program that demonstrates the problem and also the **exact** version number of the compiler / Visual Studio / and so on that you are using. — Eric Lippert, Sep 27 '18 at 17:07
I suspect that @clcto is correct; you are not compiling the code you're showing us. That is the error you would get if the variables were used in an `else`, where they might not be assigned because `page` could be null. — Eric Lippert, Sep 27 '18 at 17:10
Are you sure that you specified proper version of C# in the advanced build settings ? — Dmytro Mukalov, Sep 27 '18 at 17:10
@EricLippert, really? try this solution: https://github.com/Narvey/SORepro/tree/add05ba077e526cdfb9f2c1623d4b702356f7296 — NH., Sep 27 '18 at 22:46
**The problem only repros if `page` is `dynamic`, which you have inconveniently omitted from the question**. That fact was revealed by the repro which you posted; please post repros *in the question* and not in external web sites. — Eric Lippert, Sep 27 '18 at 22:48
@EricLippert, thank you, I'm sorry I missed that all-important detail the first time. — NH., Sep 27 '18 at 22:58
The proper workaround is to not use dynamic in the null comparison. If you did `if (((object)p) != null) && ...` then you would not have the problem. — Eric Lippert, Sep 27 '18 at 23:00
I can't post an answer until the question is reopened, but briefly: your conclusion that the method body cannot be entered until both halves of the `&&` is executed is false in general. **Most of the time that is true** but there are weird cases where it is not true, and **making one of the operands dynamic means that the compiler no longer has any evidence that we are not in a weird case**, so it has to default to the more conservative behaviour. — Eric Lippert, Sep 27 '18 at 23:20
Here's an example of a weird case: https://dotnetfiddle.net/NFtqTg. We enter the body of the `if` even though `T()` is never called. This program is really crazy, and no sensible person would ever write this code, but it is *possible*, and so the compiler must assume that T() is *possibly not* called in your case, and therefore the `out` parameter is possibly never initialized. — Eric Lippert, Sep 27 '18 at 23:21
It is extremely instructive to study my crazy program; you will learn a lot about the weird corner cases of C#'s rules for user-defined conversions and operator overloading. For some thoughts on the design of these rules, see my 2012 article on the subject: https://ericlippert.com/2012/04/19/null-is-not-false-part-three/; the rest of that series is also relevant, so maybe start from the beginning. — Eric Lippert, Sep 27 '18 at 23:23
In case it is not clear from the text of the program what is going on, imagine if instead I gave you this program fragment: `C c1 = P.OperatorEquals(p, null); C c2; bool b1 = C.OperatorFalse(c1); if (b1) c2 = c1; else { bool b2 = T(); C c3 = C.ImplicitConversion(b2); c2 = C.OperatorAnd(c1, c3); } bool b3 = C.OperatorTrue(c2); if (b3) { ... }`. **That program has the same control flow as the control flow generated by the compiler for my crazy program**. — Eric Lippert, Sep 27 '18 at 23:40
Plainly the body `...` can be entered without calling `T()` if `b1` and `b3` are both `true`. The compiler has no reason to believe that they will not both be true, and in fact, I've given you a sample program where they *are* both true because `operator true` and `operator false` both always return `true`! Again, no sensible person would write a program where op true and op false were *not opposites*, but it is *possible*, and so the compiler must reason that `T()` might not be called even if the `if` body is entered. — Eric Lippert, Sep 27 '18 at 23:42
Also, note that in my crazy program if you make `p` of type `dynamic` then the runtime *generates the code that has the logic above at runtime*. This is in my opinion the most impressive characteristic of `dynamic` in C#: that the runtime behaviour you get is almost always *exactly* the runtime behaviour you *would* have got had the types all been known at compile time. That was not easy code to write! — Eric Lippert, Sep 27 '18 at 23:50
@EricLippert the question is now re-opened if you want to migrate your answer from the comments. It is very interesting. — clcto, Sep 28 '18 at 13:37

Eric Lippert · Answer 1 · 2018-11-19T17:38:48.457

UPDATE: This question was the subject of my blog in November 2018. Thanks for the interesting question!

The documentation implies that out parameters do not need to be initialized (only declared) before they are sent to the method.

That's correct. Moreover, a variable passed to an out parameter is definitely assigned when the call returns, because as you note:

Method T is required to set the variables before returning, so this error seems like hogwash to me

Seems that way, doesn't it? Appearances can be deceiving!

Note that even with a short-circuiting &&, the second expression has to execute in order for the "consequence" block of the if to execute.

That is, surprisingly, false. There is a way for the consequence to execute even if the call to T does not execute. Doing so requires us to seriously abuse the rules of C#, but we can, so let's do it!

Instead of

    dynamic p = "";
    string s;
    if (p != null && T(out s))
        System.Console.WriteLine(s);

We'll do

    P p = new P();
    if (p != null && T())
        System.Console.WriteLine("in the consequence");

and give a definition for class P that causes this program to run the consequence but not run the call to T.

The first thing we have to do is turn p != null into a method call instead of a null check, and that method must not return bool:

class P
{
    public static C operator ==(P p1, P p2)
    {
        System.Console.WriteLine("P ==");
        return new C();
    }
    public static C operator !=(P p1, P p2)
    {
        System.Console.WriteLine("P !=");
        return new C();
    }
}

We are required to overload both == and != at the same time in C#. Overriding Equals and GetHashCode is a good idea but not a requirement, and nothing in this program is a good idea so we'll skip that.

OK, so we now have if (something_of_type_C && T()), and since C is not bool, we'll need to override the && operator. But C# does not allow you to override the && operator directly. Let's digress a moment and talk about the semantics of &&. For Boolean-returning functions A and B, the semantics of bool result = A() && B(); are:

bool a = A();
bool c;
if (a == false) // interesting operation
  c = a;
else
{
  bool b = B(); 
  c = a & b;    // interesting operation
}
bool r = c;

So we generate three temporaries, a, b, and c, we evaluate the left side A(), we check to see if a is false. If it is, we use its value. If not, we compute B() and then compute a & b.

The only two operations in that workflow that are specific to the type bool are check for falsity and non-short-circuiting &, so *those are the operations that are overloaded in a user-defined &&. C# requires you to overload three operations: user defined &, user defined "am I true?" and user defined "am I false?". (Like == and !=, the last two have to be defined in pairs.)

Now, a sensible person would write operator true and operator false so that they always returned opposites. We are not sensible people today:

class C
{
    public static bool operator true(C c)
    {
        System.Console.WriteLine("C operator true");
        return true;
    }

    public static bool operator false(C c)
    {
        System.Console.WriteLine("C operator false");
        return true; // Oops
    }

    public static C operator &(C a, C b)
    {
        System.Console.WriteLine("C operator &");
        return a;
    }
}

Notice that we also require that user-defined & take two Cs and return a C, which it does.

All right, so, recall we had

if (p != null && T())

and p != null is of type C. So we must now generate this as:

C a = p != null; // Call to P.operator_!=
C c;
bool is_false = a is logically false; // call to C.operator_false
if (is_false) 
  c = a;
else
{
  bool b = T();
  c = a & b; // Call to C.operator_&
}

But now we have a problem. operator & takes two Cs and returns a C, but we have a bool returned from T. We need a C. No problem, we'll add an implicit user-defined conversion to C from bool:

public static implicit operator C(bool b)
{
    System.Console.WriteLine("C implicit conversion from bool");
    return new C();
}

OK, so our logic is now:

C a = p != null; // Call to P.operator_!=
C c;
bool is_false = C.operator_false(a);
if (is_false)
  c = a;
else
{
  bool t = T(); 
  C b = t; // call to C.operator_implicit_C(bool)
  c = a & b; // Call to C.operator_&
}

Remember what we are heading towards here is:

if (c)
  System.Console.WriteLine("in the consequence");

How do we compute this? C# reasons that if you have operator true on C then you should be able to use it in an if condition by simply calling operator true. So finishing it off, ultimately we have the semantics:

C a = p != null; // Call to P.operator_!=
C c;
bool is_false = C.operator_false(a);
if (is_false)
  c = a;
else
{
  bool t = T(); 
  C b = t; // call to C.operator_implicit_C(bool)
  c = a & b; // Call to C.operator_&
}
bool is_true = C.operator_true(c);
if (is_true) …

But as we see in this crazy example, we can enter the consequence of the if without calling T no problem provided that operator false and operator true both return true. When we run the program we get:

P !=
C operator false
C operator true
in the consequence

A sensible person would never write code where a C was considered to be both true and false at the same time, but a not-sensible person like me today could, and the compiler knows that because we designed the compiler to be correct regardless of whether the program is sensible.

So that explains why if (p != null && T(out s)) says that s can be unassigned in the consequence. If p is dynamic then the compiler reasons "p might be one of these crazy types at runtime, in which case we are no longer working with bool operands, and therefore s might not be assigned".

The moral of the story is: dynamic makes the compiler extremely conservative about what could happen; it has to assume the worst. In this particular case, it has to assume that p != null might not be a null reference check and might not be bool, and that operator true and operator false might both return true.

So, is this a legitimate bug (I'm on C# 7.0)?

The compiler's analysis is correct -- and believe me, this was not easy logic to write or test.

Your code has the bug; fix it.

How should I handle this?

If you want to do a null reference check against a dynamic, your best bet is: if it hurts when you do that, don't do that.

Cast away the dynamic and get back to object, and then do the reference equality check: if (((object)p) == null && …

Or, another nice solution is to make it extremely explicit: if (object.ReferenceEquals((object)p, null) && …

Those are my preferred solutions. A worse solution is to break it up:

if (p != null)
  if (T(out string s))
     consequence

Now there is no operator & called even in the worst case. Note though in this case we can still be in a scenario where p != null is true and p is null, since there is nothing stopping anyone from overloading != to always return true regardless of its operands.

This proves again that operator overloading is, in general, a bad feature that should never exist. Is there a case that by using operator overloading you may write a better, more readable and more maintainable code? — Luca Cremonesi, Sep 28 '18 at 15:13
@LucaCremonesi, sounds like a question for Software Engineering... oh wait, already asked: https://softwareengineering.stackexchange.com/a/136531/74174 — NH., Sep 28 '18 at 15:18
But I agree. You should never overload true/false, as even [the doc on the true operator](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/true-operator) says! (incidentally, https://stackoverflow.com/q/5203498/1739000 is kind of related) — NH., Sep 28 '18 at 15:31
Eric, one more followup. Why isn't it possible to actually make P a dynamic, if classes like P is why the compiler is so cautious about dynamics? https://dotnetfiddle.net/kWQ8Dp — NH., Sep 28 '18 at 15:37
@LucaCremonesi: I disagree with your assessment. Let's take the `+` operator for example. In C# it is severely overloaded; it can add char, int, uint, long, ulong, decimal, double, float and string, (and add anything to a string), and enums and delegates. Now, I would make the argument that this goes too far, in that string concatenation and delegate sequencing are not logically *additions*. But are you saying that C# really should have either (1) made you call a method to add together two decimals, or (2) have different punctuation for each of those flavors? — Eric Lippert, Sep 28 '18 at 17:13
@NH.: I don't understand your follow-up question; can you explain it a little more? — Eric Lippert, Sep 28 '18 at 17:15
@LucaCremonesi: If your concern is restricted to *user-defined overloaded operators*, then I submit to you that there are many valid uses for overloading an operator. `BigInteger` for example seems like a great candidate to have a `+` operator, and it would be unfortunate if the developers of `BigInteger` could not supply such an operator. The problem I have with user-defined operator overloading is that the designers of C++ made the horribly awful decision to make the first introduction most developers have to the feature the `<<` and `>>` operators on streams. — Eric Lippert, Sep 28 '18 at 17:17
@LucaCremonesi: That decision basically communicated "the purpose of operator overloading is to make "cute" new meanings for existing operators that are completely divorced from any existing semantics you might already have". So people get the idea that, oh, the "right way to do it" in C++ is to say "let's make customer plus product equal purchase order" or some such nonsense that has nothing to do with the semantics of addition. The fact that the feature can be abused is not unique to user-defined operator overloading though. — Eric Lippert, Sep 28 '18 at 17:19
@EricLippert, I was saying you are arguing that the compiler plays it safe with the dynamic, as it might actually be some weird class P that does terrible things it shouldn't do, and then you go on to define P, but forget about the fact that in my example, it was dynamic, not P. — NH., Sep 28 '18 at 17:20
@NH.: Ah, I think you may have missed my point. The fundamental design characteristic we put on `dynamic` was that *operations on an expression of type `dynamic` have exactly the same semantics at runtime as if the compiler had correctly known the types at compile time*. This puts a constraint on the compiler that if a program containing `dynamic` would have been illegal for *any type*, it must give an error at either compile time or run time. This is such a case, albeit a really unusual case. Because there exists such a type P, dynamic must behave like it regardless of what it really is! — Eric Lippert, Sep 28 '18 at 17:33
@NH.: Let me put that another way. If you took my little program and said `dynamic p = new P();` instead of `P p = new P();` the resulting program would have *exactly the same behaviour*, it would just work out that behaviour at run time, not compile time. And since it must have that behaviour in *my* program, where I use `P`, then it must also have the same behaviour in *your* program, because the compiler just sees `dynamic` and assumes the worst. — Eric Lippert, Sep 28 '18 at 17:35
@EricLippert: Beyond how esoteric and gross it is to use operators like that, part of the reason this particular scenario feels a bit weird is that it violates the mental shortcut, "Dynamic is kind of like object, but it runs the compiler at runtime. If the compile fails, you get a runtime exception." OP's example generates a compiler error, even though usually compiler errors involving dynamic translate into possible runtime exceptions. It feels slightly odd that "Use of unassigned local variable" is a compiler error instead of a possible runtime error. — Brian, Sep 28 '18 at 18:30
@EricLippert: As you've explained, actually turning this specific "use of unassigned local variable" into a runtime error doesn't match up at all with how dynamic actually works. However, I think there's value in being able to point out exactly how engineering intuitions can lead one astray; I've often seen you ask developers to explain how they came to their wrong conclusions. — Brian, Sep 28 '18 at 18:36
@Brian: I agree that the whole scenario is weird. One of the problems that language designers have to deal with all the time is exactly this sort of tradeoff. As languages become more complex, the interactions between existing and new features also become more complex, and you also have to think about *future* feature interactions. (And it's hard to make a prediction, especially about the future.) — Eric Lippert, Sep 28 '18 at 18:39
@EricLippert: I was referring to the user-defined overloaded operators. I agree that it could still be appropriate and perfectly readable for numeric types as such as `Complex` or `BiigInteger`. Even `DateTime\Offset` and `TimeSpan` can, in a sense, be considered special numeric types, as they are a `long` (plus an offset/kind in the `DateTime\Offset`). However I think that any other type that does not represent a numeric type and does not "naturally" have the operations overloaded (e.g. sum) should not be able to overload the operators. A specific function is more appropriate in these cases. — Luca Cremonesi, Sep 28 '18 at 19:10
@EricLipper: Unfortunately I don't know how it could be possible to allow/prevent operator overloading only for specific types and for specific operators. I feel it is a feature that is critically disruptive for the code readability and maintainability and bug prone if it is not implemented perfectly. On the other hand, the great majority of the languages support operator overloading so I shall be wrong on this topic. — Luca Cremonesi, Sep 28 '18 at 19:21
@LucaCremonesi: I really do put it on C++; its design set the tone for this feature and it has been difficult to get away from that. What is really vexing is: it didn't have to be that way at all. The designers of C++ had every opportunity to introduce a *new* operator, say `>>=`, that would be overloadable and have the semantics of a monadic bind, and then it would be natural and *by design* to build composable workflow classes; you could then use *that* thing for IO streaming. I'm not saying go full on Haskell, but I'm saying that there were other choices that weren't made. — Eric Lippert, Sep 28 '18 at 19:44
@Brian: In particular, consider the design of the "dynamic" feature; it is by design a *local* feature. When you do `foo.bar()` and `foo` is of type `dynamic`, we expect that the code generated at the call site will be some dynamic dispatch. And if there is no "bar" at runtime, then we'd expect a runtime error. But it would be quite strange if an error showed up *at a completely different code location that was not itself at all dynamic*. The dynamic feature does not mean that *your entire method* or *your entire program* is analyzed at runtime, just *the dynamic expression*. — Eric Lippert, Sep 28 '18 at 21:37
OK, all fancy talk how "compiler is right" and "language is wrong" is actually wrong :-) When shortening a circuit, compiler should force determined true/false instead of blindly re-touching a non-bool. 3-val logic will still stand. This would also provide a stronger guarantee - that no object from the shorted expression will be touched until it's immed. scope "(expr)" is left. Also, for dynamic, it should retain and follow static data flow info when deductible. It does so for assignment (assign int and then float and it will notice and yell fault even though it's declared dynamic). — ZXX, Feb 05 '19 at 11:13

Bug with detection of unassigned local variables (when dynamic variables affect code flow prediction)

1 Answers1

Linked

Related