0

I am trying to implement a single-writer, multiple-reader queue in pthreads. The synchronization pattern works but eventuallly deadlocks after repeated requests (I believe). It works with one writer boss thread and one reader worker thread indefinitely, but if I have one writer boss thread, and multiple reader worker threads, it eventually hangs. When I backtrace in gdb, I see this:

// Boss:
Thread 1 (Thread 0x7ffff7fd1780 (LWP 21029)):
#0  0x00007ffff7bc44b0 in futex_wait
...

// Worker:
Thread 2 (Thread 0x7ffff42ff700 (LWP 21033)):
#0  0x00007ffff7bc39f3 in futex_wait_cancelable 
...

// Worker:
Thread 3 (Thread 0x7ffff3afe700 (LWP 21034)):
#0  0x00007ffff7bc39f3 in futex_wait_cancelable
...

To me this seems like the workers are waiting on the signal, and the boss is hanging on the signal and not sending it. But, I don't know why that would happen.

I have tried this synchronization pattern:

// Boss:
pthread_mutex_lock(&queue_mutex);
queue_push(&queue, data);
pthread_cond_signal(&queue_condition);
pthread_mutex_unlock(&queue_mutex);
return;

// Worker(s):
pthread_mutex_lock(&queue_mutex);
while((queue_isempty(&queue)) > 0) { 
    pthread_cond_wait(&queue_condition, &queue_mutex);
}
data_t *data = queue_pop(&queue);
pthread_mutex_unlock(&queue_mutex);
do_work(data);

To the best of my knowledge, this is the correct synchronization pattern. But, evidence suggests I am not applying the correct pattern. Could someone help me understand why this single-writer, multiple-reader queue access in pthreads would not work as I intend?

caf
  • 233,326
  • 40
  • 323
  • 462
Anthony O
  • 622
  • 7
  • 26
  • The posted code looks correct to me. The problem seems to be in some code you didn't post. Can you post a complete code example so that we can reproduce the problem. – Support Ukraine Sep 16 '19 at 06:07
  • 2
    BTW: The guideline for questions like this says: "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error **and the shortest code necessary to reproduce it in the question itself**" – Support Ukraine Sep 16 '19 at 06:09
  • Yes, I concur that this code looks correct. You should also show more of the backtrace - it should identify which lines in your code it's deadlocking on (it looks like some other thread must be holding the `queue_mutex`, preventing the producer thread from acquiring it). – caf Sep 16 '19 at 06:41
  • Does the "real* code also miss error checking completely? – alk Sep 16 '19 at 11:32

1 Answers1

-3

Here is the best guess based on available code-let. The dead lock is probably caused by the workers hold the lock while waiting for signal, and the boss got no chance to hold the lock (while worker is holding it, in order to send the signal). The following should avoid the dead lock.

// Boss:
pthread_mutex_lock(&queue_mutex);
queue_push(&queue, data);
pthread_mutex_unlock(&queue_mutex);
pthread_cond_signal(&queue_condition);
return;

// Worker(s):
while((queue_isempty(&queue)) > 0) {   //> assume queue_isempty(const void*);
    pthread_cond_wait(&queue_condition, &queue_mutex);
}
pthread_mutex_lock(&queue_mutex);
data_t *data = queue_pop(&queue);
pthread_mutex_unlock(&queue_mutex);
do_work(data);
KL-Yang
  • 381
  • 4
  • 8
  • 1
    1.: This `while((queue_isempty(&queue))` accesses `queue` concurrently and unprotected. Not good. 2.: The moment `pthread_cond_wait()` returns `queue_mutex` *is* locked by definition. No need to lock it again. – alk Sep 16 '19 at 11:29
  • @alk Those two are related - the line `pthread_mutex_lock(&queue_mutex);` was incorrectly moved to after the `while()` loop in this answer - and the call to `pthread_cond_signal(&queue_condition);` was similarly moved to being done with the mutext not being locked. – Andrew Henle Sep 16 '19 at 11:56
  • 1
    @AndrewHenle: "*I* know that this *one* move does not make sense, still it introduces *two* bugs, as mentioned in my comment. – alk Sep 16 '19 at 12:07
  • 2
    `pthread_cond_wait()` releases the mutex while it waits (and the thread *must* have it locked when it calls the function). – caf Sep 16 '19 at 12:33