1

I've heard rumours that lock is "slow" but never tried to measure its performance myself until now. What would be the right way to benchmark it? I use the code below which produces consistent results, but I'm still not sure if I do it correctly.

class Program
{
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<ClassUnderTest>();
    }
}

[Config(typeof(Config))]
public class ClassUnderTest
{
    private class Config : ManualConfig
    {
        public Config()
        {
            Add(Job.LegacyJitX64);
            Add(MarkdownExporter.StackOverflow);
        }
    }

    private readonly object _o = new object();

    public decimal Money;

    [Setup]
    public void SetupData()
    {
        Money = 0;
    }

    [Benchmark(Baseline = true)]
    public decimal NoLock()
    {
        return ++Money;
    }

    [Benchmark]
    public decimal Lock()
    {
        lock (_o)
        {
            return ++Money;
        }
    }
}

Results:

Host Process Environment Information:
BenchmarkDotNet=v0.9.8.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4790 CPU 3.60GHz, ProcessorCount=8
Frequency=3507519 ticks, Resolution=285.1018 ns, Timer=ACPI
CLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1055.0

Type=ClassUnderTest  Mode=Throughput  Platform=X64  
Jit=LegacyJit  GarbageCollection=Concurrent Workstation  

 Method |     Median |    StdDev | Scaled |
------- |----------- |---------- |------- |
 NoLock | 22.7166 ns | 0.1533 ns |   1.00 |
   Lock | 38.0836 ns | 0.2947 ns |   1.68 |
Alex G.
  • 909
  • 1
  • 9
  • 16
  • 4
    The slowness doesn't come from the lock itself, it comes from the fact that as you increase the number of threads contending for that lock, the average amount of time one will spend waiting for said lock increases. – Jonathon Reinhart Jul 18 '16 at 12:04
  • I would always try and use locks as a last resort, firstly by trying to make an immutable pipeline first. – Callum Linington Jul 18 '16 at 12:06
  • An uncontended lock takes around 20ns, so these results seem ok. It's not the lock that is slow, it's the fact that 1) while one thread is doing the work, other threads must wait, and 2) the context switch will take ~1 microsecond. In this particular case, if performance was a bottleneck and you needed multithreaded updating, you could switch the `decimal` to a `long` and simply use `Interlocked.Increment` to get atomic lockless addition. – vgru Jul 18 '16 at 12:34
  • My apologies if I wasn't clear enough. I'm not trying to improve anything. My intent was to measure the time to acquire/release lock (obviously, with the help of BenchmarkDotNet). As you can see, the code is very synthetic and it has nothing to do with real-world application. I just wanted to get some numbers and verify if they'd make sense. – Alex G. Jul 18 '16 at 13:18
  • @Groo, `takes around 20ns` ... `the context switch will take ~1 microsecond` Any credible source for this? This is exactly what I wanted. – Alex G. Jul 18 '16 at 13:18
  • 2
    @AlexeyGroshev: of course, it's a mandatory reading for anyone doing concurrent programming: [Threading in C#, by Joe Albahari](http://www.albahari.com/threading/). There is a performance comparison of several locking constructs in a table in [part 2. "Basic synchronization"](http://www.albahari.com/threading/part2.aspx#_Locking). You basically choose between a `Monitor` (20ns), `Mutex` (1us) or `ReaderWriterLockSlim` (40ns), depending on your goals (or of course a `SemaphoreSlim` if you need a semaphore besides locking). – vgru Jul 18 '16 at 15:25
  • I didn't see that table in a book. – Alex G. Jul 18 '16 at 15:49
  • @Alexey: [Part 2., Locking](http://www.albahari.com/threading/part2.aspx#_Locking). – vgru Jul 18 '16 at 16:13

0 Answers0