5

I am experimenting with compiler performance. I have a very small piece of code, just a few floating point multiplications and additions. The code gets executed in a loop several million times. I am trying to compile the code in C++, c#, java, perl, python ... and then I run the result while measuring execution time.

I am quite dissatisfied with c# performance. My c# code is about 60% slower than equivalent C++ or java. I am sure, there must be a bug in my experiment. I would like to disassemble the resulting binary to learn why.

Is there such a option with MSIL code? Can the result of the c# JIT compiler be examined on the machine instruction level (x86 instructions, not MSIL instructions)?

Update

the code (u, v, g* are double; steps is integer)

stopwatch.Start();
for (int i = 0; i < steps; i++)
{
    double uu = g11 * u + g12 * v;
    v = g21 * u + g22 * v;
    u = uu;
}
stopwatch.Stop();
danatel
  • 4,844
  • 11
  • 48
  • 62
  • Are you sure your benchmark allows the JIT to kick in in the first place? Perhaps you should show it. –  May 14 '11 at 11:22
  • @delnan: How do you mean? The code couldn't be run if the JIT didn't compile it. – Guffa May 14 '11 at 11:30
  • @Guffa: the question is if the JIT time should be included in the benchmark. – H H May 14 '11 at 11:35
  • Show us the code and describe the scenario how you run it. Specify, how and where you are measuring the time. Be aware of the fact, that your code and any of it's calees get JITted first time when they are called, which of course introduces a time penalty. After that, there are not many reasons why the code should run slowlier than equivalent code which was directly compiled into a native binary. Yeah, exactly what @Guffa meant, he was just faster (and less verbose) than me.. – Paul Michalik May 14 '11 at 11:38
  • @Guffa, Paul Michalik - the execution time is between 3 to 5 seconds. – danatel May 14 '11 at 11:50

3 Answers3

7

Debug your code in Visual Studio (but compile in release mode), put a breakpoint in the loop, and open the Disassembly window (Debug -> Windows -> Disassembly) when the execution stops at the breakpoint.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • 3
    @danatel: It works but you might need to do better. Rewrite the program to run the code of interest twice, and put up a message box between the two runs. Attach the debugger to the process when the message box comes up. Remember, **the jitter knows whether you are debugging or not**. It can, and does, generate less optimal code when the debugger is running to make the debugging experience better. If you want to know what the jitter is doing when the debugger is not running, then attach the debugger after the jitter generates the code the first time. – Eric Lippert May 14 '11 at 14:57
  • @Eric Lippert: I know that there is a big difference between compiling in debug more and release mode, but is there also a difference depending on whether there is a debugger attached? – Guffa May 15 '11 at 00:11
  • 1
    @Guffa: There can be. The jitter knows whether a debugger is attached and can choose to skip some optimizations that make it hard to debug. – Eric Lippert May 15 '11 at 00:18
  • @Eric Lippert: Thank you for your advice, but in this case the difference was clearly visible even in debug mode. C++ code cleverly keeps everything inside the FPU stack, c# pushes and pulls the variables from RAM. The next step in my experiment is to use gcc and intel compilers for comparision. – danatel May 15 '11 at 07:26
  • @danatel in debug builds (technicallys any assembly which doesn't have the attribute that allows optimisation) stops enregistering variables when jitted, thus any looking at debug builds in tHis way is silly unless you are infect running debug builds normally! – ShuggyCoUk May 15 '11 at 13:41
  • @ShuggyCoUk - maybe I was not precise, I mean "debug mode" in the context of the Guffa method - release build with debuger attached. Nevertheless, the point is that I started the experiment measuring the release build with full optimization and no debugger attached and it was slower than c++ code. – danatel May 15 '11 at 20:27
  • @danatel I would guess you didn't follow the proviso about running the code once *then* attaching the debugger. I think it unlikely that no variables would be enregistered if you did. this doesn't change the performance differential, but if you are assuming that is the reason it is slower you are likely to be incorrect... – ShuggyCoUk May 16 '11 at 09:22
4

Ngen your program and disassemble the results.

Andrew Savinykh
  • 25,351
  • 17
  • 103
  • 158
1

Maybe you could ngen (compile to native code) the binary first, to avoid the JIT compilation.

M4N
  • 94,805
  • 45
  • 217
  • 260