1

EDIT: The dis assembly of both cases are also attached based on given comment through Tools > Options > Debugging > 'Suppress JIT optimization on module load' > uncheck (clear this option, to do JIT Optimization even in debug mode). It can be seen that the generated machine code is not good when pseudo codes are available! in fast version, the code use CPU Registers directly but in other version, it load p1.x and p1.y from memory in every iteration! i don't know how could we control such behavior!

When i was trying to measure execution time of a block of code for performance testing, i found some irregularities that confused me: if i add some block of (pseudo) code before the actual measuring loop, the total elapsed time is increased by a large factor (from 120 msec to 220 msec i.e. about 1.8 times slower! in my 2 GHz PC). Note that in real case, these pseudo codes are some different codes required for initialization or similar purposes.

NOTE: I build the app in Release Mode and run it by Start Without Debug (Ctrl+F5) to ensure all optimizations are applied to get best results.

I show a simplified scenario here as sample:

void test()
{
    Point p1 = new Point(1, 2);
    Point p = p1;
    //*** !!! Comment & UnComment the following 2 lines to see difference: ***
    p.Offset(p1); p.Offset(p1); p.Offset(p1); p.Offset(p1);
    p.Offset(p1); p.Offset(p1); p.Offset(p1); p.Offset(p1);

    Stopwatch timer = new Stopwatch();
    double dt = -1;

    for (int repeat = 1; repeat <= 5; repeat++)
    {
        p = p1; //here we reset the initial p

        timer.Restart();
        for (int i = 0; i < 50000000; i++)
        {
            p.Offset(p1);
        }
        timer.Stop();
        dt = timer.ElapsedMilliseconds;

        textBox1.Text += repeat + "] " + dt + ", p: " + p + "\r\n";
        Application.DoEvents(); if (this.IsDisposed) break;
    }
}

private void button1_Click(object sender, EventArgs e)
{
    test();
}

This issue prevents me from comparison of results i need for optimization. By the way, this may relates to struct types(?).

Do you know what is the reason and how to solve it?

EDIT (4 Apr.): i checked and found that this behavior is only happen for struct types not class types. and as we know, structs are usually value types in Stack rather than heap. this may have a role here but i don't know..

Another Note: i also found that if i uncheck the 'Optimize Code' option in project Properties > Build tab, then the code actually runs faster!

The screenshot of two cases are shown below: Run Screenshot

The dis-assembly: dis-assembly Screenshot

S.Serpooshan
  • 7,608
  • 4
  • 33
  • 61
  • Just to be sure - you're using Release builds, outside of VS? – MarcinJuraszek Apr 03 '16 at 06:48
  • 4
    I cant reproduce the issue. My results vary from 100 to 600 within a same run session, no matter if the 2 lines commented or not. Release build, no debugger. – Diligent Key Presser Apr 03 '16 at 06:55
  • NOTE: I build the app in `release mode` and press `Start Without Debug`. without these settings, the result are not optimized by Visual Studio and vary each time! – S.Serpooshan Apr 03 '16 at 07:11
  • @MarcinJuraszek: the post also edited to mention it is build in Release Mode and run by 'Start Without Debug' command. – S.Serpooshan Apr 03 '16 at 07:30
  • @DiligentKeyPresser: the Stopwatch has a Restart() method in newer .Net versions. but you can use the combination of `timer.Reset(); timer.Start();` instead of it. Note that `timer.Start()` does not reset it and you add to prev time for each loop! – S.Serpooshan Apr 03 '16 at 07:33
  • 1
    Longer methods are known to result in less efficient code. I trust someone can analyse the CIL for this specific case, but if you want a general explanation, see http://www.dotnetperls.com/method-size. You can try refactoring your initialization code out into another method and check whether that improves performance. – Douglas Apr 03 '16 at 08:32
  • @Douglas: thanks, but what is said in your link shows defining many more local vars (a,b,c,d,...) and cause only 7% slowness which is because of stack! here we did not define any new var, we just do some operation with our local vars p & p1. if i use a for loop instead of manual copy of `p.Offset(p1);` codes the issue will disappeare. I remember i had a such strange behavior a month ago which only adding 1 simple code like as `k2+=k1+1;` change the overall time by a big factor! this shows the content of code is important here! – S.Serpooshan Apr 03 '16 at 09:11
  • @saeedserpooshan: Yes, the specific scenario discussed in the link does not apply to your case. However, there are factors other than the number of local variables that affect compilation; and your extra two lines might be triggering some behaviour of the JIT that causes it to forgo certain optimizations. I'll try to find an authoritative source on this. – Douglas Apr 03 '16 at 09:25
  • The same behaviour confirmed for refactoring the init code into another method. The difference is 81ms vs 285ms for JIT64 and 114ms vs 220ms for RyuJIT. IL seems the same for me, but I am no expert. – Antonín Lejsek Apr 03 '16 at 09:30
  • Look at the CIL. That's the only way to get to the bottom of it. – Noldorin Apr 04 '16 at 11:36

1 Answers1

1

Interesting question.

In these cases, assembler always tells you the truth (not the CIL, that will be exactly the same!). It might be that your thing triggered a boundary condition that changes the assembler of the inner-most loop.

I changed the code into a console application, made the necessary adjustments (suppress optimization flag etc), release mode, put a breakpoint, F5 and hit ctrl-alt-d.

Note: I noticed I also had to increase the test case by a factor 10 to get these timings. The timings are exactly the same, as I expected.

Still, there could be an issue with f.ex. register allocation, so let's check that. Assembler never lies.

Case 1 assembler.

                for (int i = 0; i < 500000000; i++)
00007FFE23C74526  xor         ecx,ecx  
                {
                    p.Offset(p1);
00007FFE23C74528  inc         r14d  
00007FFE23C7452B  add         r15d,2  
                for (int i = 0; i < 500000000; i++)
00007FFE23C7452F  inc         ecx  
00007FFE23C74531  cmp         ecx,1DCD6500h  
00007FFE23C74537  jl          00007FFE23C74528  
                }

Case 2 assembler

                for (int i = 0; i < 500000000; i++)
00007FFE23C84526  xor         ecx,ecx  
                {
                    p.Offset(p1);
00007FFE23C84528  inc         r14d  
00007FFE23C8452B  add         r15d,2  
                for (int i = 0; i < 500000000; i++)
00007FFE23C8452F  inc         ecx  
00007FFE23C84531  cmp         ecx,1DCD6500h  
00007FFE23C84537  jl          00007FFE23C84528  
                }

Conclusion

The assembler output is exactly the same, and so is the data -- In other words: the performance of the inner loop is also exactly the same.

Still, do check the assembler output for yourself as I've explained. If you have another version of the .NET JIT, it should explain the behavior.


Apparently there's a bit of confusion on how you should test emitted assembler code. Here's the correct way to do it:

  • Tools -> Options -> Debugging -> Suppress JIT optimization on module load -> uncheck (If you clear this option, you can debug optimized JIT code, but your ability to debug may be limited)
  • Disable "Just my code" as well in the same settings pane.
  • Set the build type to "release" mode. Also check configuration manager.
  • Right click project -> "Properties" -> Build tab. Check "Optimize code".
  • Optional, same tab: ensure that "Prefer 32-bit code" is unchecked. (NOTE: The 32-bit JIT and the 64-bit JIT are two different things.)

Set breakpoint, F5, ctrl-alt-d, pray your breakpoint is hit, otherwise set it somewhere else and try again.


I'm a bit surprised by some of the comments, so let me go over them:

  • @Douglas notes that longer methods result in less efficient code. Yes, I've actually made a bug report about this to the Microsoft CoreCLR team. This is true, but with the remark that the JIT optimizer will only be disabled when compiling very large methods. Very large in this context is 60.000 lines of IL code. The exact limits can be found here: https://github.com/dotnet/coreclr/blob/01a5e9b4580cf6ea21de672f627402c30658ef22/src/jit/compiler.h#L7131 . In this case, it's totally irrelevant.
  • @Noldorin notes that the OP should look at the IL code. I would like to point out that the emitted IL code is exactly the same in both scenario's.
atlaste
  • 30,418
  • 3
  • 57
  • 87
  • Can you explain exactly what you mean by "made the necessary adjustments (suppress optimization flag etc)"? – S.Serpooshan Apr 06 '16 at 17:47
  • If memory serves, it's just a few compiler settings in Visual Studio, you can easily find them. Normally running code in the debugger (or debug mode) will prevent JIT optimizations - which will give you wrong results. I also make it a habit to disable the 32-bit default jitter (you can find it under project properties). – atlaste Apr 06 '16 at 17:56
  • did you test the code with: 'Start Without Debug' as i mentioned in the post? what you mean by 'disable the 32-bit default jitter'? would you please provide exact settings? as i know, c# does not have many options to control optimization, i just set 'Config Manager: Release' and 'Optimize Code' checkbox in Project Properties > Build tab. – S.Serpooshan Apr 07 '16 at 13:35
  • I add the dis-assembler code with options as you mention. The optimized codes are different. please see the update post – S.Serpooshan Apr 09 '16 at 17:27
  • That looks like 32 bit, not 64 bit asm. Also, make sure to check for Windows updates. – atlaste Apr 09 '16 at 17:31
  • i'm running on Windows 8, 32 bit OS and Visual Studio .Net 2013. i could not set build platform target to x64 type. Is it important to have 64 bit asm? – S.Serpooshan Apr 10 '16 at 09:04
  • Well yes these are different instruction sets :-) Also, it feels a bit like you're using an older version of the runtime, even though you haven't confirmed that. Different versions compile differently, so that's also relevant. – atlaste Apr 10 '16 at 17:00