4

I wasn't able to get a definitive answer on this so this question. There are few SO posts in the past that mentioned that instances of HashAlgorithm are not thread-safe quoting snippet in the MSDN doc.

See

But, the current MSDN doc doesn't say so. Surprisingly, the below code doesn't bomb on net3.1, net5.0, but does on net6.0. So, it looks like it was made thread-safe (perhaps), but perhaps net6.0 has a bug.

//<TargetFrameworks>net6.0;net5.0;netcoreapp3.1;net48</TargetFrameworks>
[Explicit]
[Test]
public void Bork_HashAlgorithm()
{
    const int iterations = 1_000_000;
    var bytes = Encoding.UTF8.GetBytes("the overtinkerer");
    using (var md5 = MD5.Create())
    {
        Parallel.For(0, iterations, (i, loop) =>
        {
            md5.ComputeHash(bytes);
        });
    }
}

Exception message:

SafeHandle cannot be null. (Parameter 'pHandle')

hIpPy
  • 4,649
  • 6
  • 51
  • 65
  • 1
    It's generally assumed in MSDN that an instance method is not thread-safe unless explicitly stated. – Charlieface Jul 17 '22 at 02:47
  • 1
    The problem with thread safety problems is that the buggy code can work just often enough to lull you into a false sense of security. – Ian McLaird Jul 17 '22 at 16:55
  • @IanMcLaird I can tell you first-hand that your statement is true. I've seen `HMACSHA256` `HashAlgorithm` shared instance work in PRD for years without issue on net3.1, but doesn't pass the above test. :exploding-head: – hIpPy Jul 17 '22 at 19:33
  • 2
    FFT: `net5.0`+ adds a thread-safe static method for md5 (https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.md5.hashdata?view=net-5.0). Similar for hmac (https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.hmacsha256.hashdata?view=net-6.0). Similar effort is also done for random with a static thread-safe impl (https://learn.microsoft.com/en-us/dotnet/api/system.random.shared?view=net-6.0). I went with the thread-safe static method as new'ing up was quite slower. `ThreadLocal`/`ThreadStatic` are _faster_, and also an option. – hIpPy Aug 20 '22 at 02:41

1 Answers1

12

No, it's not thread-safe.

We're seeing same exceptions under peak load: SafeHandle cannot be null. (Parameter 'pHandle') and similar errors for MD5 provider, for SHA1, and for SHA256, all over the place.

However we found that it's still beneficial to use a "singleton" instance of HashAlgorithm with a lock, it's still 3X faster than creating a separate instance every time.

Here's the benchmark

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
MD5Recreate 1,282.5 ns 726.26 ns 39.81 ns 0.0801 0.0286 0.0038 512 B
MD5SingletonWithLock 402.2 ns 39.38 ns 2.16 ns 0.0610 - - 384 B
MD5_HashData 467.7 ns 33.26 ns 1.82 ns 0.0548 - - 344 B

Here's the code:

static MD5 _md5 = MD5.Create(); // <-- one instance for all threads
public static byte[] MD5Hash(byte[] input)
{
    lock (_md5) // <-- use a lock
    {
        return _md5.ComputeHash(input);
    }
}

UPDATE: MD5_HashData is the new static MD5.HashData method introduced in .NET 5 - it's not faster than using lock. However MD5.HashData shows better performance under heavy parallelism, which BDN cannot emulate (see comments below)

UPDATE2: adding benchmarks for SHA256

Method Mean Error StdDev Gen0 Gen1 Allocated
SHARecreate 2.211 us 3.7097 us 0.2033 us 0.1144 0.0534 736 B
SHASingletonWithLock 1.231 us 0.1349 us 0.0074 us 0.0954 - 608 B
SHA_HashData 1.296 us 0.0737 us 0.0040 us 0.0877 - 552 B
Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
  • I see similar results in benchmark.net. But with `Parallel.For()`, `MD5_HashData` is 4x faster than `MD5SingletonWithLock`. So, I'm not convinced with the benchmark.net results. (See https://github.com/rmandvikar/random2/compare/dev...hashalgorithm) – hIpPy Oct 27 '22 at 07:39
  • 1
    @hIpPy fair poin! However, your test code runs a million operation in parallel which is far from our real load (I assume `Parallel` is internally limited by the number of CPU cores?) However, thanks for the data. I guess one should peak the preferred method based on their scenarios. – Alex from Jitbit Oct 27 '22 at 07:43
  • 1
    @hIpPy I have just updated the answer with SH256 benchmarks as well – Alex from Jitbit Oct 27 '22 at 07:48
  • 1
    I see. With `maxDegreeOfParallelism = 1`, I get similar results to benchmark.net, but with `maxDegreeOfParallelism = 2`, `SHA_HashData` is still ~1.6x faster than `MD5SingletonWithLock`. :shrug: I don't feel that's far-fetched. – hIpPy Oct 27 '22 at 08:06
  • That said, my question doesn't specifically ask for a _faster_ or _fastest_ option, just a _correct_ option. So I'm willing to accept your answer. I'd request that you please add a caveat to the point about perf I made (else the answer would look misleading). FWIW, I went with thread-safe static method (`SHA_HashData`). – hIpPy Oct 27 '22 at 08:13
  • Am I blind? Where is the implementation for the referenced methods? E.g. `SHA_HashData` – Jaans Nov 22 '22 at 00:27
  • 1
    @Jaans it's the built-in `SHA.HashData` – Alex from Jitbit Nov 22 '22 at 10:37