3

In my program, pthreads are spawned and wait for conditions, and may be cancelled while waiting. This works for a while but then stops working - the thread cannot acquire the mutex anymore.

Below is the SSCCE. The routine code follows well-known canonical examples for pthread_cond_wait, except I don't need to check while (condition not true) before waiting, because the condition is never signaled (if I do check, the same problem appears).

For clarity, I am not including checking returns of system calls, but they are all successful.

#include <stdio.h>
#include <semaphore.h>
#include <pthread.h>
#include <unistd.h>

pthread_mutex_t global_lock;
pthread_cond_t global_cond;

void *routine(void *arg) {
    pthread_mutex_lock(&global_lock);
    printf("waiting...\n");

    /*the condition will never arrive*/
    pthread_cond_wait(&global_cond, &global_lock);
    pthread_mutex_unlock(&global_lock);
    return NULL;
}

int main() {
    pthread_t thread;

    pthread_cond_init(&global_cond, NULL);
    pthread_mutex_init(&global_lock, NULL);

    while (1) {
        pthread_create(&thread, NULL, routine, NULL);

        /*
        ** give thread enough time to start waiting
        ** should perhaps have thread signal us it started waiting
        ** but this is good enough for this example
        */
        sleep(1);

        pthread_cancel(thread);
    }

    return 0;
}

I would expect this would print waiting... every second. But it does not, it just prints it twice and that is it, third time, routine is not able to initially acquire the mutex. Why??

Mark Galeck
  • 6,155
  • 1
  • 28
  • 55
  • 1
    Threads are not cancelled. If you run `ps -eLf` you will see that more and more threads are started and none are terminated. My guess is that `pthread_cond_wait` acts like cancellation point only once, when function is entered to. Setting cancellation type to `PTHREAD_CANCEL_ASYNCHRONOUS` "fixes" the problem (but with potential leak of resources). – gudok May 07 '16 at 06:58
  • @gudok I did not include checking of returns, but they are all successful, so `pthread_cancel` succeeds. – Mark Galeck May 07 '16 at 07:32
  • 1
    Zero exit status of `pthread_cancel` means that cancel request was queued and nothing more. In particular, it doesn't wait until target thread terminates. – gudok May 07 '16 at 07:38
  • `man pthread_cond_wait`: *pthread_cond_wait and pthread_cond_timedwait are cancellation points. If a thread is cancelled while suspended in one of these functions, the thread immediately resumes execution, then locks again the mutex argument to pthread_cond_wait and pthread_cond_timedwait, and finally executes the cancellation.* – EOF May 07 '16 at 09:11
  • 1
    @EOF please help me understand this. I read this manpage and to me this does not make sense, because, if `pthread_cond_wait` is a cancellation point, then the mutex would be reacquired but then the thread is cancelled, so the mutex is never released. It does not make sense to prevent release of a mutex. Why would `pthread_cond_wait` behave this way?? – Mark Galeck May 07 '16 at 13:50
  • @gudok it seems, that to both fix this problem, and prevent "leak of resources", I should disable cancellation at the beginning of that thread,, and then only enable it (ASYNCHRONOUS) at specific points, namely, while I am not holding the mutex, and disable it again, before acquiring the mutex. Does this make sense? – Mark Galeck May 07 '16 at 13:56
  • @gudok no it does not :) never mind – Mark Galeck May 07 '16 at 14:41
  • @EOF never mind I do understand now, thank you – Mark Galeck May 07 '16 at 15:47

1 Answers1

3

After somethinking, I was able to fix the problem using EOF's comment. If EOF wants to post an answer by himself, or copy and paste parts or whole of my answer, I will accept his answer.

The problem with my code is that it does not establish a "cleanup handler" for the cancellation point at pthread_cond_wait. The corrected code for routine is as follows, and now it works as expected.

void cleanup_handler(void *plock) {
    pthread_mutex_unlock(plock);
}

void *routine(void *arg) {
    pthread_cleanup_push(cleanup_handler, &global_lock);

    pthread_mutex_lock(&global_lock);
    printf("waiting...\n");

    /*the condition will never arrive*/
    pthread_cond_wait(&global_cond, &global_lock);
    pthread_mutex_unlock(&global_lock);

    pthread_cleanup_pop(0);
    return NULL;
}
Mark Galeck
  • 6,155
  • 1
  • 28
  • 55
  • Note the `pthread_cleanup_pop` call at the end. The compiler gave me troubles because I didn't call that function. (For more information about that look at the man page) – GitProphet Oct 17 '19 at 13:57