We have several full lock
-s in our web application that could be more performant if they were replaced with read-writer locks - even with the additional overhead of the RW lock, our code would be more parallel. Unfortunately, I cannot use ReaderWriterLockSlim
because of what seems like a pretty big synchronization bug that may lead to deadlocks on certain hardware. This article explains the problem in great detail.
The above issue seems to be fixed in .NET Framework 4.0.30319.33440 but not all Windows server versions get this fix. Windows 8.1 and Windows Server 2012 R2 have the fix. Windows Server 2012 (not R2) and Windows Server 2008 R2 still have the bug, even after the latest patches. It looks like Microsoft doesn't plan on fixing this problem on all platforms.
Our various server environments use different versions of Windows Server and I've confirmed that some servers have the fix, while some don't. This means that I cannot safely change all those lock
statements to reader-writer locks because on some of our servers the application may deadlock randomly.
As an alternative, I'm looking into reimplementing ReaderWriterLockSlim
by decompiling the code in System.Core and creating a new class (something like MyRWLock
) that has the one (known) buggy function ExitMyLock
fixed. All other code is the same as the original ReaderWriterLockSlim
.
This requires the removal of all __DynamicallyInvokable
attributes and the addition of 2 or 3 classes / structures that are internal to System.Core
. I did all this and have the new lock class compiles without errors.
My question is: can anyone think of any reason that this new class wouldn't work as the original ReaderWriterLockSlim
class does? I consider myself fairly good when it comes to threading but I'm not an expert. Since I didn't change any code (other than fixing some type names to point to the new MyRWLock
instead of ReaderWriterLockSlim
and the attributes), I believe this should work. I do wonder, however, if I forgot about something that may break this in various interesting ways that are hard to debug.
Alternatively: is my (and the linked article's) understanding wrong? Does this problem not need fixing in the first place? The author of that article seems to have done a very detailed analysis which looks correct to me and yet Microsoft didn't apply the change to certain Windows Server versions.
Any thoughts on this would be much appreciated.
Edit:
To add more context for the full locks, here's what they are for. We have a service that reads / writes a remote service when it performs its work. There are many more reads than writes. A read involves 1 network roundtrip and a write involves 3 network roundtrips to the remote service. When a write happens, no read may happen (the write is really a read->delete->add). Right now, we use full lock
-s around all of these operations but this means that all the threads that try to read still have to queue up until the current read finishes, even without a write. It seems to me that an RW lock would be ideal for this.