16

I have been looking in to you some code wasn't working. Everything looks fine except for the following line.

Transport = Transport?? MockITransportUtil.GetMock(true);

Before that line is executed Transport is null. I see the GetMock executed and that it returns a non null object. After that line Transport is still null;

I looked at the IL that was generated an it looks fine to me.

 IL_0002:  ldarg.0
  IL_0003:  ldfld      class [Moq]Moq.Mock`1<class [CommLibNet]CommLibNET.ITransport> Curex.Services.Common.UnitTests.Messaging.TestIGuaranteedSubscriptionBase::Transport
  IL_0008:  dup
  IL_0009:  brtrue.s   IL_0012
  IL_000b:  pop
  IL_000c:  ldc.i4.1
  IL_000d:  call       class [Moq]Moq.Mock`1<class [CommLibNet]CommLibNET.ITransport> Curex.Services.Common.UnitTests.Mocking.MockITransportUtil::GetMock(bool)
  IL_0012:  stfld      class [Moq]Moq.Mock`1<class [CommLibNet]CommLibNET.ITransport> Curex.Services.Common.UnitTests.Messaging.TestIGuaranteedSubscriptionBase::Transport

We see the function get called and stfld should take the return value and set the field.

So I then looked at the assembly I see the call get made but it looks like the return in RAX gets blown away by the next call and is lost.

            Transport = Transport?? MockITransportUtil.GetMock(true);
000007FE9236F776  mov         rax,qword ptr [rbp+0B0h]  
000007FE9236F77D  mov         rax,qword ptr [rax+20h]  
000007FE9236F781  mov         qword ptr [rbp+20h],rax  
000007FE9236F785  mov         rcx,qword ptr [rbp+20h]  
000007FE9236F789  mov         rax,qword ptr [rbp+0B0h]  
000007FE9236F790  mov         qword ptr [rbp+28h],rax  
000007FE9236F794  test        rcx,rcx  
000007FE9236F797  jne         000007FE9236F7AC  
000007FE9236F799  mov         cl,1  
000007FE9236F79B  call        000007FE92290608  

            //var x = ReferenceEquals(null, Transport) ? MockITransportUtil.GetMock(true) : Transport;
            ListerFactory = ListerFactory ?? MockIListenerUtil.GetMockSetupWithAction((a) => invokingAction = a);
000007FE9236F7A0  mov         qword ptr [rbp+30h],rax  
000007FE9236F7A4  mov         rax,qword ptr [rbp+30h]  
000007FE9236F7A8  mov         qword ptr [rbp+20h],rax  
000007FE9236F7AC  mov         rcx,qword ptr [rbp+28h]  

if I use an if statement or a ?: operator everyting works fine.

Visual Studio 2013

EDIT

I have create a psudo minimal reproduction.

class simple
{
    public A MyA = null;
    public B MyB = null;

    public void SetUp()
    {
        MyA = MyA ?? new A();
        MyB = new B();// Put breakpoint here
    }
}

If you set a breakpoint on the indicated line and look at the value of MyA in the debugger it will still be null(only if building in x64). if you execute the next line it will set the value. I have not been able to reproduce the assessment not happening at all. Its very clear in the disassembly the execution for the next line has begun before the assignment takes place.

Edit 2

Here is a link to the ms connect site

rerun
  • 25,014
  • 6
  • 48
  • 78
  • More code please or some location where we can download the code. I want to be able to reproduce the problem. – Pellared Mar 21 '14 at 18:14
  • Can you show a bit more of the disassembly, at least through `000007FE9236F7AC` (where the `jne` goes)? – TypeIA Mar 21 '14 at 18:15
  • I attempted to create a very small minimal reproducible but it worked. – rerun Mar 21 '14 at 18:16
  • @dvnrrs added requested assembly – rerun Mar 21 '14 at 18:17
  • Change the processor :) Have you tried it on another computer? – Pellared Mar 21 '14 at 18:19
  • @rerun I had **exactly** this same problem before and I swore to myself that it was a compiler bug - but forgot to post it on SO. ;) I fixed it by using an `if` or `?:`, exactly the same as you're saying. – Timothy Shields Mar 21 '14 at 18:23
  • I'm stumped. I agree with @Pellared, I'd like to be able to reproduce it. I'd also be curious if changing the [JIT optimization setting](http://msdn.microsoft.com/en-us/library/ms241594.aspx) had any effect, or if targeting x86 instead of x64 would demonstrate the same behavior. – TypeIA Mar 21 '14 at 18:24
  • Its in a unit test x64 debug. – rerun Mar 21 '14 at 18:29
  • @rerun I should add that when I tried to create a minimal case in which this weird behavior occurred, I was unable to do so. It was only happening deep in my large codebase. But the variable it was happening to was a `double? x` - something along the lines of `x = x ?? 1.0;`. If `x` was `null` when that line was reached, it was still `null` afterwards. – Timothy Shields Mar 21 '14 at 18:30
  • @TimothyShields exactly the same behaviour – rerun Mar 21 '14 at 18:49
  • 2
    @rerun Then I'm not crazy. :) Paging Eric Lippert... – Timothy Shields Mar 21 '14 at 18:50
  • @dvnrrs Shields updated with a psudo min repro – rerun Mar 21 '14 at 23:57
  • subscribing as I am curious... – Ahmed ilyas Mar 22 '14 at 00:07
  • 2
    Watch out there is a problem with the 64-bit debugger [see here](http://stackoverflow.com/questions/19352130/why-doesnt-the-null-coalescing-operator-work-in-this-situation/19352399#19352399) – Guvante Mar 22 '14 at 00:11
  • 2
    @Guvante it's the same problem in different forms its not the debugger though. according to the assembly its generating the fld assignment in the wrong place. It looks like it might be perfoming optimizations when it should not. – rerun Mar 22 '14 at 01:41
  • @rerun Yep, it's not the debugger. I was getting flat-out incorrect program behavior. – Timothy Shields Mar 22 '14 at 20:54

2 Answers2

10
    MyB = new B();// Put breakpoint here

The problem is the breakpoint, not the code generation. The x64 jitter flubs this, it generates inaccurate debugging info. It emits line number info for the statement incorrectly, using a code address that's still part of the previous statement.

You can tell from the disassembly you posted, the code at addresses F7A0 through F7A8 are still part of the ?? statement. The branch to F7AC is the real one, that's where the next statement starts. So it should have said that F7AC was the start of the next statement, not F7A0.

The consequences of this bug is that the debugger may never stop at the breakpoint. You can see this for yourself by altering your repro code and write public A MyA = new A(); And that if it does stop then the assignment isn't executed yet. So you still see the variable having the previous value, null in your case. A single step resolves it, albeit that it depends on what the next statement looks like.

Rest assured that this only goes wrong when you debug, the program still operates correctly. Just keep this quirk in mind, afaik it only goes wrong for the ?? operator. You can tell it doesn't get used much :) Albeit that most programmers only ever debug the 32-bit version of their program, the default project settings heavily encourage it.

The problem is being addressed as we speak, don't expect your Connect report to have an affect, Microsoft is well aware of this bug. The jitter team at Microsoft has rewritten the x64 jitter completely, it is currently in CTP2. I'd estimate another year or so before it is released.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • This isn't the problem. I've had the same problem as the asker before, and I was getting incorrect program behavior because of it, not just debugger problems. Furthermore, the behavior was inconsistent depending on what machine I was on. The `??` would fail on one machine and work properly on another, which points to there being a bug in the JIT. – Timothy Shields Mar 22 '14 at 20:53
  • One bug at a time please, you can report your own. – Hans Passant Mar 22 '14 at 21:52
  • I only saw this problem because it made it out of the posted function as null. Once I changed the program at all it would get assignment to occur. – rerun Mar 23 '14 at 01:34
  • @HansPassant In rerun's linked MS Connect bug report, he writes "**In production code I was able to see the assignment never happen.** in a minimal reproduction I can't get this to occur." (emphasis added) That's the exact same problem I had. In my production code the null coalescing operator was outright broken and I had to switch to alternative syntax. When I tried to make a toy program that reproduced the problem, I was unable to. The same is happening to rerun. So it looks like we encountered the same bug. – Timothy Shields Mar 23 '14 at 06:27
  • Well, another aspect of this bug is what's going to happen in the Release build with the optimizer turned on. I can see it possible that the optimizer will eliminate the right-hand side expression of the ?? operator when it thinks it is not being used. Depends a great deal on what the expression looks like, it won't optimize a constructor call away for example. Pretty hard to nail that down. – Hans Passant Mar 23 '14 at 09:59
  • So thats interesting because I found this when I changed code from calling a constructor to calling a static function that did construction and some setup. I was also wondering if this is an issue with the jit not realizing its in debug and therefore that it should not reorder code. Also I don't understand why the ?? is the only issue as an equivalent if statement should create almost identical IL. – rerun Mar 24 '14 at 14:11
1

I got an update from MS that this was indeed a real problem and has been fixed in the upcoming release of the x64 jiter.

rerun
  • 25,014
  • 6
  • 48
  • 78