The reader writer lock I wrote backed by GCD code causes a deadlock in parallel test

Question

I implemented this reader/writer lock in GCD, and it failed in parallel test. Can I get explanation for why it fails?

This is for iOS development. The code is based on Objective C. I wrote a RWCache with reader/writer lock in GCD for data protection.

@interface RWCache : NSObject

- (void)setObject:(id)object forKey:(id <NSCopying>)key;

- (id)objectForKey:(id <NSCopying>)key;

@end

@interface RWCache()
@property (nonatomic, strong) NSMutableDictionary *memoryStorage;
@property (nonatomic, strong) dispatch_queue_t storageQueue;
@end

@implementation RWCache

- (instancetype)init {
    self = [super init];
    if (self) {
        _memoryStorage = [NSMutableDictionary new];
        _storageQueue = dispatch_queue_create("Storage Queue", DISPATCH_QUEUE_CONCURRENT);
    }
    return self;
}

- (void)setObject:(id)object forKey:(id <NSCopying>)key {
    dispatch_barrier_async(self.storageQueue, ^{
        self.memoryStorage[key] = object;
    });
}

- (id)objectForKey:(id <NSCopying>)key {
    __block id object = nil;
    dispatch_sync(self.storageQueue, ^{
        object = self.memoryStorage[key];
    });
    return object;
}

@end


int main(int argc, const char * argv[]) {
    @autoreleasepool {
        RWCache *cache = [RWCache new];
        dispatch_queue_t testQueue = dispatch_queue_create("Test Queue", DISPATCH_QUEUE_CONCURRENT);
        dispatch_group_t group = dispatch_group_create();
        for (int i = 0; i < 100; i++) {
            dispatch_group_async(group, testQueue, ^{
                [cache setObject:@(i) forKey:@(i)];
            });
            dispatch_group_async(group, testQueue, ^{
                [cache objectForKey:@(i)];
            });
        }
        dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    }
    return 0;
}

If no deadlocking, the program will exit 0, otherwise the program will hang and not exit.

I'm not entirely sure why your code is deadlocking, but this seems like a ridiculously complicated way of a sharing access to a dictionary. A simple `NSLock` would do the trick ... and be about 100 times more efficient. — James Bucanek, Aug 02 '19 at 06:00
@JamesBucanek You're right. This is probably not efficient. I just try to implement reader/writer lock in GCD and see how it works, and I also want to know why it causes deadlock for this demo. — HuangKun3251, Aug 02 '19 at 10:53
I would not assume it’s less efficient. Benchmark it. The last time I benchmarked it, reader-writer was a lot faster... — Rob, Aug 02 '19 at 15:20
@Rob, fair enough. But I would like to make the distinction between *fast* and *efficient*. I acknowledge that in some extreme edge case, it might be possible for a read-write scheme like this to run faster for an `O(1)` operation like this—but it can't possibly be more *efficient*. My proof is that a read operation will involve at least two thread switches, plus attendant queuing and dequeuing of the blocks, controlled by (probably multiple) semaphore and barrier operations. That cannot possible beat a single (probably uncontested) semaphore. ("efficient" defined as work done per watt) — James Bucanek, Aug 03 '19 at 18:44
@Rob Sidebar: for the OP this is an exercise, which is fine. My point is about real-world applications, especially ones targeting mobile devices—although I think us desktop developers should also do our part in making software more efficient. — James Bucanek, Aug 03 '19 at 18:45
Here are [some old benchmarks that I did](https://stackoverflow.com/a/20939025/1271826). Maybe they’re changed the `NSLock` implementation, but back in the day, it was not great (because it was using mutex locks behind the scenes, IIRC). Nowadays if you wanted to use old-school locks, you might use the new `os_unfair_lock` as outlined in that 2016 WWDC video. Plus, in many real-world situations, the reader-writer’s ability to permit concurrent reads is a game changer. But, regardless, `NSLock` certainly isn’t “100 times more efficient”. Lol. — Rob, Aug 03 '19 at 19:40

Rob · Accepted Answer · 2019-08-02T16:12:14.210

The problem is not the reader/writer pattern, per se, but rather because the general thread explosion in this code. See “Thread Explosion Causing Deadlock” discussion in WWDC 2015 video Building Responsive and Efficient Apps with GCD. The WWDC 2016 Concurrent Programming With GCD in Swift 3 is also a good video. In both of those links, I’m dropping you off in the relevant portion of the video, but both are worth watching in their entirety.

Bottom line, you are exhausting the very limited number of worker threads in GCD’s thread pool. There are only 64. But you’ve got 100 writes with barriers, which means that the dispatched block can’t run until everything else on that queue is done. These are interspersed with 100 reads, which because they are synchronous, will block the worker thread from which you dispatched it until it returns.

Let’s reduce this down to something simpler that manifests the problem:

dispatch_queue_t queue1 = dispatch_queue_create("queue1", DISPATCH_QUEUE_CONCURRENT);
dispatch_queue_t queue2 = dispatch_queue_create("queue2", DISPATCH_QUEUE_CONCURRENT);
for (int i = 0; i < 100; i++) {
    dispatch_async(queue2, ^{
        dispatch_barrier_async(queue1, ^{
            NSLog(@"barrier async %d", i);
        });
        dispatch_sync(queue1, ^{
            NSLog(@"sync %d", i);
        });
    });
}
NSLog(@"done dispatching all blocks to queue1");

That produces something like:

starting
done dispatching all blocks to queue1
barrier async 0
sync 0

And it deadlocks.

But if we constrain it so that no more than, say 30 items can run concurrently on queue2, then the problem goes away:

dispatch_queue_t queue1 = dispatch_queue_create("queue1", DISPATCH_QUEUE_CONCURRENT);
dispatch_queue_t queue2 = dispatch_queue_create("queue2", DISPATCH_QUEUE_CONCURRENT);
dispatch_semaphore_t semaphore = dispatch_semaphore_create(30);

for (int i = 0; i < 100; i++) {
    dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);
    dispatch_async(queue2, ^{
        dispatch_barrier_async(queue1, ^{
            NSLog(@"barrier async %d", i);
        });
        dispatch_sync(queue1, ^{
            NSLog(@"sync %d", i);
        });
        dispatch_semaphore_signal(semaphore);
    });
}
NSLog(@"done dispatching all blocks to queue1");

Or, another approach is to use dispatch_apply, which effectively a parallelized for loop, but limits the number of concurrent tasks at any given moment to the number of cores on your machine (keeping us well below the threshold to exhaust worker threads):

dispatch_queue_t queue1 = dispatch_queue_create("queue1", DISPATCH_QUEUE_CONCURRENT);
dispatch_queue_t queue2 = dispatch_queue_create("queue2", DISPATCH_QUEUE_CONCURRENT);

dispatch_apply(100, queue2, ^(size_t i) {
    dispatch_barrier_async(queue1, ^{
        NSLog(@"barrier async %ld", i);
    });
    dispatch_sync(queue1, ^{
        NSLog(@"sync %ld", i);
    });
});
NSLog(@"done dispatching all blocks to queue1");

I had my own guess and write it [here](https://github.com/huang-kun/Deadlock) before I saw your answer. This is fantastic. Please let me know if my assumption is as similar as yours. — HuangKun3251, Aug 02 '19 at 11:17
Yeah, same idea, though I might quibble as I think any discussion is incomplete without talking about “worker threads” and GCD’s “pool” of worker threads. This is a constraint (though not entirely unreasonable one) imposed by GCD. I’d refer you to [Thread Explosion Causing Deadlock](https://developer.apple.com/videos/play/wwdc2015/718/?time=2020) video. — Rob, Aug 02 '19 at 16:18

The reader writer lock I wrote backed by GCD code causes a deadlock in parallel test

1 Answers1