1

We have several full lock-s in our web application that could be more performant if they were replaced with read-writer locks - even with the additional overhead of the RW lock, our code would be more parallel. Unfortunately, I cannot use ReaderWriterLockSlim because of what seems like a pretty big synchronization bug that may lead to deadlocks on certain hardware. This article explains the problem in great detail.

The above issue seems to be fixed in .NET Framework 4.0.30319.33440 but not all Windows server versions get this fix. Windows 8.1 and Windows Server 2012 R2 have the fix. Windows Server 2012 (not R2) and Windows Server 2008 R2 still have the bug, even after the latest patches. It looks like Microsoft doesn't plan on fixing this problem on all platforms.

Our various server environments use different versions of Windows Server and I've confirmed that some servers have the fix, while some don't. This means that I cannot safely change all those lock statements to reader-writer locks because on some of our servers the application may deadlock randomly.

As an alternative, I'm looking into reimplementing ReaderWriterLockSlim by decompiling the code in System.Core and creating a new class (something like MyRWLock) that has the one (known) buggy function ExitMyLock fixed. All other code is the same as the original ReaderWriterLockSlim.

This requires the removal of all __DynamicallyInvokable attributes and the addition of 2 or 3 classes / structures that are internal to System.Core. I did all this and have the new lock class compiles without errors.

My question is: can anyone think of any reason that this new class wouldn't work as the original ReaderWriterLockSlim class does? I consider myself fairly good when it comes to threading but I'm not an expert. Since I didn't change any code (other than fixing some type names to point to the new MyRWLock instead of ReaderWriterLockSlim and the attributes), I believe this should work. I do wonder, however, if I forgot about something that may break this in various interesting ways that are hard to debug.

Alternatively: is my (and the linked article's) understanding wrong? Does this problem not need fixing in the first place? The author of that article seems to have done a very detailed analysis which looks correct to me and yet Microsoft didn't apply the change to certain Windows Server versions.

Any thoughts on this would be much appreciated.

Edit:

To add more context for the full locks, here's what they are for. We have a service that reads / writes a remote service when it performs its work. There are many more reads than writes. A read involves 1 network roundtrip and a write involves 3 network roundtrips to the remote service. When a write happens, no read may happen (the write is really a read->delete->add). Right now, we use full lock-s around all of these operations but this means that all the threads that try to read still have to queue up until the current read finishes, even without a write. It seems to me that an RW lock would be ideal for this.

xxbbcc
  • 16,930
  • 5
  • 50
  • 83
  • `that could be more performant if they were replaced with read-writer locks`, First profile it whether it is worth to change it or not. – L.B Oct 23 '14 at 20:12
  • Also, unit test your code. Make sure changing these methods hasn't caused anything else to break. – Yuval Itzchakov Oct 23 '14 at 20:14
  • 1
    @L.B The locks are used to make certain operations atomic. These operations involve network traffic so it's pretty much guaranteed that they would be faster with RW locks. Well over half of these operations involve reading a remote server but some try to write it. – xxbbcc Oct 23 '14 at 20:15
  • @L.B It's not very well designed but changing it is not feasible at the moment - these operations go against an interface that was not designed for atomic access. – xxbbcc Oct 23 '14 at 20:17
  • @xxbbcc I know what *lock* is. But the results may not always be as you expected. I repeat, first profile it(at least with a simplified case) and see how much performance improvement you will get. (BTW: you may also try to use 2 locks, one for read and one for write to test your case) – L.B Oct 23 '14 at 20:17
  • 1
    @L.B I didn't mean to imply that you don't know what a lock is. I simply said (or tried to say) that our threads do a _lot_ more reading than writing. Each read is 1 roundtrip, each write is 3 roundtrips. While a write is happening, no read may happen. This seems pretty clear-cut to me - it's not possible that the RW lock overhead would be as expensive as the constant full locking on operations that may go in parallel (but are serialized by the full lock). Do you see an error in my reasoning? – xxbbcc Oct 23 '14 at 20:32
  • This threading bug is really nasty.; The only reason decompiling the code would not work is if the JIT treats this class specially. That is very unlikely. The framework has only a handful of places that contain runtime hooks. Managed code is expressive enough for almost everything. You should be quite safe doing this.; Consider just implementing your own RWL. Use Monitor internally. It is probably not that hard if you just lock everything down for each operation. You don't care about constant costs here. – usr Oct 23 '14 at 21:01
  • @usr I'm somewhat wary of rolling my own RW lock - if I overlook even something small, I'll introduce a hard-to-fix bug in a system that's used by over 100,000 users. If it makes into production, this would be a big deal. This is why I'm hoping no one sees a possible problem with my current approach - I'm not changing any logic in the existing code (my machine already has the fix) so I'm really just duplicating the correct ReaderWriterLockSlim under a different name. – xxbbcc Oct 23 '14 at 21:14
  • Good point. Another safety strategy is to always wait on locks with a timeout. In case of a bug (rare) you stall parts of the app for a while and then fail some requests. Better than exhausting the thread pool with waiting threads within a few seconds. – usr Oct 24 '14 at 07:41

1 Answers1

1

Is the article accurate? I don't know. They at least point to updates in .NET that address the issue they claim exists, so that suggests it is. On the other hand, there is still the question as to whether it applies to you. Desktop x86 machines have volatile semantics for all memory access anyway, so if the only bug describes is the lack of an explicit volatile operation, that wouldn't be a concern on a huge part of the hardware out there.

On big server hardware, the architectures are different, and they might well have different semantics, requiring explicitly volatile operations. That'd be something you might want to research.

Of course, regardless of hardware architecture, "volatile" also has meaning to the .NET compilers. The code quoted in the article doesn't seem to me like it would be affected by compiler optimizations, but I'm not an expert in that area so I couldn't say for sure.

I take it from the question that you have not actually witnessed this bug occur. That it's a purely hypothetical concern at this point.

I would be wary of trying to reimplement this type of class, even using the original source as a starting point.

My first choice would be to just ensure I've upgraded to a recent enough .NET version that the bug described in the article you've referenced has been fixed.

Second choice would be to use the plain ReaderWriterLock class instead of ReaderWriterLockSlim.

Third choice would be to use the native Win32 "slim reader/writer lock" structure (i.e. via p/invoke or C++/CLI).

Only if none of those options proved workable would I go to the trouble to implement my own reader-writer lock, and you can bet I would not attempt the "slim" variant (i.e. with all the tricky lock-free stuff...I could see using one volatile flag, but otherwise just stick with conventional Monitor-based code).

Peter Duniho
  • 68,759
  • 7
  • 102
  • 136
  • Oh, I think I wasn't clear enough: I don't plan on making _any_ changes to the decompiled code. I only removed attributes that are internal to System.Core. I have no intention of actually implementing an RW lock - I simply want to fix the one known bug. – xxbbcc Oct 24 '14 at 01:01
  • Ah, okay. I see what you mean now. I think that technically that should work fine. However, I'd still be hesitant to do it, because that sort of thing gets entrenched in the code base and you wind up not getting the benefit of future changes to the official version (performance improvements or even bug-fixes). You can always switch back in the future, in theory. But for some reason in practice this takes a very long time to happen. Is there some reason you don't want to or can't use the ReaderWriterLock class? – Peter Duniho Oct 24 '14 at 01:13
  • I'm trying to create a wrapper around this that will hide the actual lock instance. That way in the future when ReaderWriterLockSlim is fixed everywhere (or we upgrade to server 2012 R2) I only need to update one place to change the lock type name from MyRWLock to ReaderWriterLockSlim. I can't think of a safer solution within the tima allocated for this fix on my project. – xxbbcc Oct 24 '14 at 14:43
  • Why not just use ReaderWriterLock instead of ReaderWriterLockSlim? At least that one's built into .NET already. – Peter Duniho Oct 24 '14 at 15:48
  • Oh, I forgot to reply to that - ReaderWriterLock is very slow and is not even recommended by Microsoft anymore. I read several problem stories about it in the past (prone to deadlocks, etc.) but I couldn't find any to link in for you. – xxbbcc Oct 24 '14 at 16:24
  • That seems like a pretty vague reason to avoid ReaderWriterLock. Deadlocks happen any time someone's not careful using synchronization so rumors of deadlocks when using a particular synchronization object mean nothing. As for "slow", well...yes, it's not as efficient an object as ReaderWriterLockSlim. But do you have any evidence that the performance difference matters in your case? The only reason Microsoft doesn't recommend it anymore is that they offer a better implementation in the form of RWLS...that doesn't mean the previous version is no good. – Peter Duniho Oct 24 '14 at 16:29
  • 1
    I just realized that I do have a reason other than speed (which I haven't measured): RWL doesn't have an upgradeable read mode; RWLS does. I want to use that feature in a few places. – xxbbcc Oct 24 '14 at 18:03