7

I came across some odd performance results when optimizing a program, which are shown in the following BenchmarkDotNet benchmark:

string _s, _y = "yo";

[Benchmark]
public void Exchange() => Interlocked.Exchange(ref _s, null);

[Benchmark]
public void CompareExchange() => Interlocked.CompareExchange(ref _s, _y, null);

The results are as follows:

BenchmarkDotNet=v0.10.10, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531248 Hz, Resolution=395.0620 ns, Timer=TSC
.NET Core SDK=2.1.4
  [Host]     : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
  DefaultJob : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT

          Method |      Mean |     Error |    StdDev |
---------------- |----------:|----------:|----------:|
        Exchange | 20.525 ns | 0.4357 ns | 0.4662 ns |
 CompareExchange |  7.017 ns | 0.1070 ns | 0.1001 ns |

It would seem that Interlocked.Exchange is more than twice as slow as Interlocked.CompareExchange - which is confusing because it's supposed to be doing less work. Unless I'm mistaken both are supposed be CPU ops.

Does anyone have a good explanation on why this could be happening? Is this an actual performance difference in the CPU ops or some issue in the way .NET Core is wrapping them?

If this is the situation it seem best to simply avoid Interlocked.Exchange() and use Interlocked.CompareExchange() whenever possible?

EDIT: Another odd thing: when I run the same benchmarks with int or long rather than string, I get more or less the same running time. Also, I used BenchmarkDotNet's disassembler diagnoser to look at the actually assembly being generated, and found something interesting: with the int/long version I can clearly see xchg and cmpxchg instructions, but with strings I see call into the Interlocked.Exchange/Interlocked.CompareExchange methods...!

EDIT2: Opened issue in coreclr: https://github.com/dotnet/coreclr/issues/16051

Shay Rojansky
  • 15,357
  • 2
  • 40
  • 69
  • 1
    Your CompareExchange is not doing an exchange, because `_s != null`. – Blorgbeard Jan 26 '18 at 21:57
  • Yeah, check the [docs](https://msdn.microsoft.com/en-us/library/h7etff8w(v=vs.110).aspx) you have the arguments the wrong way around – Matt Burland Jan 26 '18 at 21:59
  • Oh wait, it *does* equal null. Your `string _s, y = "yo";` syntax confused me. But that means your tests are doing different things. Probably two separate errors. – Blorgbeard Jan 26 '18 at 22:00
  • This seems related to the generic overload of `Exchange`. If you run this same code with `_s` and `_y` typed as `object`, both methods perform approximately the same. The difference seems to show itself only when the method call resolves to `Interlocked.Exchange`. This is also the reason why you are not directly seeing `xchg` instructions, there is a level of indirection. The question still remains though, why is it apparently only slowing down `Exchange`? – InBetween Jan 26 '18 at 22:41
  • @Blorgbeard The behavior is weird, see my previous comment. There is no reason why `Exchange` should be slower. – InBetween Jan 26 '18 at 22:50
  • 2
    This looks like a side-effect of efforts to optimize framework code. Added somewhere in version 4.x. Pretty clever hack, the jitter monkey-patches the call to COMInterlocked::CompareExchangeGeneric() to COMInterlocked::CompareExchangeObject(). Which is illegal in C# but safe in this case because the actual type does not matter, it only swaps object references and the method is constrained to reference types. What he overlooked however is that he could have done the exact same thing with Exchange(). Not on his to-do list, probably :) – Hans Passant Jan 27 '18 at 13:46
  • So good advice is to favor CompareExchange(). Something that Microsoft programmers that worked on the BCL types seemed to be aware of, lots of places in the framework code where CompareExchange is being used where Exchange would do. – Hans Passant Jan 27 '18 at 13:48
  • Thanks @HansPassant, that seems like the right answer. I'll open an issue to make the .NET team aware of this, if you post your comment as an answer I'll accept it. – Shay Rojansky Jan 27 '18 at 16:54
  • At the end of the day though, don't you still have to use the method that has the required semantics? I.e. if you want to use CompareExchange instead of Exchange, you would have to call it in a loop until it succeeds. At which point you have a loop that takes an unknown amount of time to succeed. CompareExchange is worth optimizing hard since you expect it (I believe) to be used frequently in loops. Exchange not so much. – Damien_The_Unbeliever Jan 27 '18 at 17:09
  • In case not clear from above - if you want to use CompareExchange where you're currently using Exchange, you need to have foreknowledge of the contents of the value stored at the destination. If you don't have foreknowledge of the contents (that is universally true) then you need to write a loop. Therefore I don't understand why you'd even start benchmarking these things. – Damien_The_Unbeliever Jan 27 '18 at 17:14
  • @Damien_The_Unbeliever first I don't really need a reason to benchmark things, it could be curiosity. Second, I'm looking at doing the opposite of what you said: I have code that currently uses CompareAndExchange but actually does not need to care about the current value, so Exchange is more appropriate (that's why I benchmarked). Finally, my usage of Exchange would be in a very performance-sensitive loop, and I really don't know why you'd assume Exchange isn't worth optimizing (after all there's a CPU op for it). – Shay Rojansky Jan 28 '18 at 18:00
  • In any case, the .NET Core team is working on fixing the issue, see the github issue link I posted. – Shay Rojansky Jan 28 '18 at 18:01

2 Answers2

8

Following up on my commentaries, this seems to be an issue with the generic overload of Exchange.

If you avoid the generic overload altogether (changing the type of _s and _y to object), the performance difference disappears.

The question remains though as to why resolving to the generic overloads only slows down Exchange. Reading through the Interlocked source code, it seems that a hack was implemented in CompareExchange<T> to make it faster. Source code commentaries on CompareExchange<T> follow:

 * CompareExchange<T>
 * 
 * Notice how CompareExchange<T>() uses the __makeref keyword
 * to create two TypedReferences before calling _CompareExchange().
 * This is horribly slow. Ideally we would like CompareExchange<T>()
 * to simply call CompareExchange(ref Object, Object, Object); 
 * however, this would require casting a "ref T" into a "ref Object", 
 * which is not legal in C#.
 * 
 * Thus we opted to cheat, and hacked to JIT so that when it reads
 * the method body for CompareExchange<T>() it gets back the
 * following IL:
 *
 *     ldarg.0 
 *     ldarg.1
 *     ldarg.2
 *     call System.Threading.Interlocked::CompareExchange(ref Object, Object, Object)
 *     ret
 *
 * See getILIntrinsicImplementationForInterlocked() in VM\JitInterface.cpp
 * for details.

Nothing similar is commented in Exchange<T> and it also makes use of the "horribly slow" __makeref so this could be the reason why you are seeing this unexpected behavior.

All this is of course my interpretation, you'd actually need someone of the .NET team to really confirm my suspicions.

InBetween
  • 32,319
  • 3
  • 50
  • 90
1

This has now been fixed on newer versions of .Net Core:

https://github.com/dotnet/coreclr/issues/16051

Yair Halberstadt
  • 5,733
  • 28
  • 60