I came across some odd performance results when optimizing a program, which are shown in the following BenchmarkDotNet benchmark:
string _s, _y = "yo";
[Benchmark]
public void Exchange() => Interlocked.Exchange(ref _s, null);
[Benchmark]
public void CompareExchange() => Interlocked.CompareExchange(ref _s, _y, null);
The results are as follows:
BenchmarkDotNet=v0.10.10, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Processor=Intel Core i7-6700HQ CPU 2.60GHz (Skylake), ProcessorCount=8
Frequency=2531248 Hz, Resolution=395.0620 ns, Timer=TSC
.NET Core SDK=2.1.4
[Host] : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
DefaultJob : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT
Method | Mean | Error | StdDev |
---------------- |----------:|----------:|----------:|
Exchange | 20.525 ns | 0.4357 ns | 0.4662 ns |
CompareExchange | 7.017 ns | 0.1070 ns | 0.1001 ns |
It would seem that Interlocked.Exchange
is more than twice as slow as Interlocked.CompareExchange
- which is confusing because it's supposed to be doing less work. Unless I'm mistaken both are supposed be CPU ops.
Does anyone have a good explanation on why this could be happening? Is this an actual performance difference in the CPU ops or some issue in the way .NET Core is wrapping them?
If this is the situation it seem best to simply avoid Interlocked.Exchange()
and use Interlocked.CompareExchange()
whenever possible?
EDIT: Another odd thing: when I run the same benchmarks with int or long rather than string, I get more or less the same running time. Also, I used BenchmarkDotNet's disassembler diagnoser to look at the actually assembly being generated, and found something interesting: with the int/long version I can clearly see xchg and cmpxchg instructions, but with strings I see call into the Interlocked.Exchange/Interlocked.CompareExchange methods...!
EDIT2: Opened issue in coreclr: https://github.com/dotnet/coreclr/issues/16051