4

I was testing how Interlocked.Increment and lock behave on my computer's architecture because I read the following lines in this article.

As rewritten with Interlocked.Increment, the method should execute faster, at least on some architectures.

Using the following code I get convinced that it's worth to review locks in my projects.

var watch = new Stopwatch();
var locker = new object();
int counter = 0;

watch.Start();
for (int i = 0; i < 100000000; i++)
{
    lock (locker)
    {
        counter++;
    }
}
watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds);

watch.Reset();
counter = 0;

watch.Start();
for (int i = 0; i < 100000000; i++)
{
    Interlocked.Increment(ref counter);
}
watch.Stop();
Console.WriteLine(watch.Elapsed.TotalSeconds);

I'm getting stable results with approximate values 2.4s for locking and 1.2s for Interlocked. However I was surprised to discover that running this code in release mode improves value only for Interlocked to approximately 0.7s and the locking time remains the same. Why is that? How is Interlocked optimized when in release mode that lock is not?

Ondrej Janacek
  • 12,486
  • 14
  • 59
  • 93
  • 1
    Generally, all performance measurements should be done only in Release mode. Debug mode performance measurements are not relevant. Read MSIL provided by C# compiler, maybe the code is too much optimized (for example, Interlocked is replaced by ++). – Alex F Dec 22 '13 at 14:07
  • 1
    An uncontended lock region needs two interlocked instructions. The numbers you measured show this perfectly, at least in debug mode. Btw, your benchmark means very little because synchronization cost depends on lock and cache-line contention. Your benchmark assumes that there is none. – usr Dec 22 '13 at 14:10
  • @usr It would be great if you could write more about it and post it as an answer. I don't understand MSIL as Alex suggests and since I read the whole article and this I still don't understand completely, more info on the topic would be welcomed. – Ondrej Janacek Dec 22 '13 at 14:16
  • @OndrejJanacek I don't have the answer because I don't know why the loops behave differently in Release mode. I'd have to look at the disassembly. I don't have time for that investigation right now. – usr Dec 22 '13 at 14:18
  • @usr I don't need the answer right now. – Ondrej Janacek Dec 22 '13 at 14:21
  • @OndrejJanacek - you will be surprised that MSIL is relatively high level and easy to read. Just type ILDASM in VS Command Prompt window and open your executable. It may be interesting... – Alex F Dec 22 '13 at 14:43
  • 2
    Performance testing in debug mode is a pointless waste of time; none of the numbers you get will be even vaguely meaningful -- unless of course your customers are going to be running your program in debug mode. – Eric Lippert Dec 22 '13 at 16:04

1 Answers1

6

You have to look at the generated machine code to see the difference, Debug + Windows + Disassembly. The debug build version of the Interlocked.Increment() call:

   00FC27AD  call        7327A810 

The release build version:

   025F279D  lock inc    dword ptr [ebp-24h] 

Or in other words, the jitter optimizer got really smart in the Release build and replaced a call to a helper function into a single machine instruction.

Optimization just doesn't get better than that. The same optimization cannot be applied to the Monitor.Enter() method call that's underneath the lock statement, it is a pretty substantial function that's implemented in the CLR and cannot be inlined. It does many things beyond Interlocked.Increment(), it allows the operating system to reschedule when a thread blocks on trying to acquire the monitor and maintains a queue of waiting threads. That can be pretty important to ensure good concurrency, just not in your test code since the lock is entirely uncontested. Beware of synthetic benchmarks that don't approximate actual usage.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536