2

Good time, all.

My problem is the following:

numerous threads are waiting for an event to acquire read lock, one thread is waiting for an event to obtain write lock. Lock is not held by any thread at that moment.

0:173> !do 0x0000000001c679f8
Name:        System.Threading.ReaderWriterLockSlim
...
Value Name
1 fIsReentrant
0 myLock
1 numWriteWaiters
28 numReadWaiters
0 numWriteUpgradeWaiters
0 numUpgradeWaiters
0 fNoWaiters
-1 upgradeLockOwnerId
-1 writeLockOwnerId
000000000381eb38 writeEvent
00000000035a32e0 readEvent
0000000000000000 upgradeEvent
0000000000000000 waitUpgradeEvent
9 lockID
0 fUpgradeThreadHoldingRead
1073741824 owners
0 fDisposed

0:173> .formats 0n1073741824
Evaluate expression:
  Hex:     00000000`40000000

From ReaderWriterLockSlim.cs:

private const uint WAITING_WRITERS = 0x40000000;

First I made an assumption of thread aborts making this sort of corruption to lock's state. Its easy to reproduce the problem: http://chabster.blogspot.com/2013/07/a-story-of-orphaned-readerwriterlockslim.html.

I made lock usages look like

try {} finally { lock.EnterXYZ(); }
try { /* resource usage code */ } finally { lock.ExitXYZ(); }

and was sure aborts could happen only within try { /* resource usage code */ }.

Now I got another dump with the same problem and I ran out of ideas.

I must say this happens time to time on 24 core environments. Could it be RWLS bug on ht/milticore/multiprocessor systems? I see that ReaderWriterLockSlim class updates it's members without interlocked instructions, which might be a potential problem on multicore environments.

PS: I'd like to hear from ReaderWriterLockSlim author, Joe Duffy, but couldn't reach him via email.

NoRegsz
  • 21
  • 2
  • It just isn't good enough, the code can still be aborted between the first finally and the second try in the x64 jitter. Stop aborting threads. – Hans Passant Dec 09 '13 at 12:42
  • Your try/finally sample code looks strange and invalid, and the usage pattern is probably where the cause lays. Try to make an SSCCE. – H H Dec 09 '13 at 12:43
  • I'm pretty sure there were no thread aborts. That was just my guess. Now I believe I guessed wrong and trying to find root cause elsewhere. – NoRegsz Dec 09 '13 at 14:28
  • @HansPassant "the code can still be aborted between the first finally and the second try" I'm sure this is not true. There are no cpu instructions jitted between those regions. Its either not finished finally or started try. – NoRegsz Dec 09 '13 at 14:30
  • 1
    Without a [SSCCE](http://sscce.org/), the best we can do is guess. Create a small example that illustrates the error. It's highly unlikely that you have discovered a bug in the `ReaderWriterLockSlim`. – Jim Mischel Dec 09 '13 at 15:06
  • @JimMischel If I had had an example or reproduce scenario, I would have fixed this long time ago. It happens rarely on customer environments. I only have logs and dmp files for analysis. Corruptions made by thread aborts were easy to reproduce and fix. But this must be something else. – NoRegsz Dec 09 '13 at 15:14
  • 1
    Still, you leave us guessing, and that gets old quickly. – H H Dec 09 '13 at 20:02
  • Very late to this one, but can this article explain your problem? http://chabster.blogspot.com.au/2013/12/readerwriterlockslim-fails-on-dual.html – Mike Chamberlain May 12 '15 at 01:02
  • @MikeChamberlain, lol, I think you actually referred the OP to one of his own articles. ;) – BlueStrat Jun 29 '17 at 01:17

0 Answers0