1

Update 3

recently, I noticed that my code randomly causes Segmentation Fault errors. But I think that my code is pretty simple so far and I cant figure out where that error comes from. Since it happens randomly, I assume that there is some kind of race-condition. I think this is all the code that might be relevant, tell me if you need more:

namespace thread {
    pthread_t terminated_thread_id, /* and others */;
    pthread_mutex_t terminate_thread = PTHREAD_MUTEX_INITIALIZER;
    pthread_cond_t terminate_thread_signal = PTHREAD_COND_INITIALIZER;
    int total_thread_count = 0;
    int termination; // + sembufs

    inline void* Exit(void* value) {
    //  This must be unlocked after all join-related jobs are done
        semop(thread::termination, thread::termination_in_process, 2)
            pthread_mutex_lock(&thread::terminate_thread);
                thread::terminated_thread_id = pthread_self();
                pthread_cond_signal(&thread::terminate_thread_signal);
            pthread_mutex_unlock(&thread::terminate_thread);

        pthread_exit(value);
        return value;
    }
}
int main(int argc, const char** argv){
...
    pthread_mutex_lock(&thread::terminate_thread);
    if(0 != pthread_create(&thread::communication_handler_thread_id, NULL,    \
                           CommunicationHandler, NULL)){
        global::PrintDebug("pthread_create() failed", __FILE__, __LINE__);
    }
    /** 2 more pthread_create()-calls */       
    do{
        thread::terminated_thread_id = pthread_self();
        pthread_cond_wait(&thread::terminate_thread_signal,                   \
                          &thread::terminate_thread);
        if(!pthread_equal(thread::terminated_thread_id, pthread_self())){
            pthread_join(thread::terminated_thread_id, NULL);
    ...
            semop(thread::termination, thread::termination_done, 1)
        }
    }while(thread::total_thread_count > 0);

    pthread_mutex_unlock(&thread::terminate_thread);
    return 0;
}

The signal terminate_thread_signal is only emitted in the thread::Exit() function. That function is also only called at the end of the function that is used to create the thread.

This is what the debugger shows for the Call Stack:

#0 (    0xb7fe2424 in __kernel_vsyscall() (??:??)
#1 0xb7fbdfcf   __pthread_cond_wait(cond=0x80539c0, mutex=0x8053998) (pthread_cond_wait.c:153)
#2 0x804a094    main(argc=1, argv=0xbffff9c4) (/home/papergay/SeekYourCar/0.2/Server/main.cpp:121)

What I already know is, if the error happens, then no thread has called thread::Exit() yet. I am also using an unnamed namespace with a few initializations (if that might be relevant). I am using Code::Blocks as IDE and GCC as compiler.

xQuare
  • 1,293
  • 1
  • 12
  • 21
  • I didn't read your code at all, but keep in mind that MinGW does some miraculous and wonderful things. For example, your application may not crash at the segmentation fault but the next time `free` is called. And I assume this happens in `pthread_cond_wait`. Are you sure that there is no segmentation fault / uninitialized memory anywhere else in your program? – Zeta Aug 05 '12 at 12:38
  • Starting off your thread count with `1` and then testing it for `0` looks bogus to me. Also your `pthread_exit` is useless, `return` does the jobs quite well. – Jens Gustedt Aug 05 '12 at 13:26
  • Okay, after having had some sleep I reviewed the thread and partly adjusted my code. And I also tried out your suggestion @Zeta. It does not seem to crash the next time `free()` is called - if you were talking about a semantically proper function call. The crash point remains the same. – xQuare Aug 07 '12 at 18:16
  • Are there other ways to locate my error than simply using the debugger? – xQuare Aug 11 '12 at 09:48

3 Answers3

1

pthread_cond_wait() is allowed to wake up spuriously, so you have to re-test the condition itself after every wakeup. This could be causing your problem - if the main thread wakes up spuriously before thread::terminated_thread_id has been set, it'll pass an invalid thread id to pthread_join().

There's also another problem in your code - there's no guarantee that the signalled thread will be the next to wake up after the mutex is unlocked, so it's possible for two threads to call thread::Exit() in quick succession, with the main thread not running until after the second exiting thread has unlocked the mutex. In this case you won't ever call pthread_join() on the first thread.

Something like this should fix those problems:

namespace thread {
    int terminate_thread_set = 0;
    pthread_mutex_t terminate_thread = PTHREAD_MUTEX_INITIALIZER;
    pthread_cond_t terminate_thread_set_cond = PTHREAD_COND_INITIALIZER;
    pthread_cond_t terminate_thread_unset_cond = PTHREAD_COND_INITIALIZER;

    /* ... */

    inline void Exit(void* value)
    {
        pthread_mutex_lock(&thread::terminate_thread);
        while (thread::terminate_thread_set)
            pthread_cond_wait(&thread::terminate_thread_unset_cond);
        thread::terminated_thread_id = pthread_self();
        thread::terminate_thread_set = 1;
        pthread_cond_signal(&thread::terminate_thread_set_cond);
        pthread_mutex_unlock(&thread::terminate_thread);

        pthread_exit(value);
    }
}

and in main:

pthread_mutex_lock(&thread::terminate_thread);

/* ... */

while(thread::total_thread_count > 0) {
    while (!thread::terminate_thread_set)
        pthread_cond_wait(&thread::terminate_thread_set_cond, &thread::terminate_thread);
    thread::terminate_thread_set = 0;
    pthread_join(thread::terminated_thread_id, NULL);
    pthread_cond_signal(&thread::terminate_thread_unset_cond);
...
}
pthread_mutex_unlock(&thread::terminate_thread);

That's not to say that you don't have other issues, of course.

caf
  • 233,326
  • 40
  • 323
  • 462
  • And this does not cause a deadlock? Aren't the threads now waiting for both signals at the same time? – xQuare Aug 05 '12 at 14:06
  • @Papergay: No, the exiting threads are waiting for `terminate_thread_set` to be `0`, and the main thread is waiting for `terminate_thread_set` to be `1`. Note that neither thread waits unless the condition it's waiting for is false (and since the mutex is held, the condition can't change out from under it). – caf Aug 05 '12 at 14:37
  • I eliminated both flaws you mentioned. It did not solve my error though... Are there any other ways to locate the error? I will look into anything that might be relevant. Thanks in advance. – xQuare Aug 07 '12 at 18:39
  • @Papergay: You can't lock a mutex in one thread and unlock it in another. – caf Aug 08 '12 at 00:13
  • 1
    @Papergay: Only that [the spec for `pthread_mutex_lock()`](http://pubs.opengroup.org/onlinepubs/007904875/functions/pthread_mutex_lock.html) says it's undefined (or that an error is returned, if its an error-checking mutex), so implementions aren't required to support it. – caf Aug 08 '12 at 00:59
  • @user1548637: You should amend your question with your attempted fix. – Marcelo Cantos Sep 02 '12 at 06:49
0

It looks as if you're unlocking your termination_in_process mutex from your main process - even though it was locked by another thread - which is undefined behavior. It might work, and it might not work.

A solution could be to use a FIFO buffer (for instance a std::queue, or even just a std::vector) and push the thread id of your terminated threads to it in your Exit() function, then send out your signal, and let the main thread go through the buffer and join any threads in it.

If Exit() isn't called at the point of your segfault this shouldn't be the reason for your problem though, but it's still something you might want to fix.

sonicwave
  • 5,952
  • 2
  • 33
  • 49
  • Yeah I did not know about that undefined behavior at that point. And I thought I had improved my code from before xD. The documentation on pthreads isn't what I would call superb, though ... I will think of something else, as you said. Thanks. One more thing, where did you read or how do you know that it will be undefined behavior? – xQuare Aug 07 '12 at 21:12
  • 1
    Can be found here for instance: http://linux.die.net/man/3/pthread_mutex_lock - for `PTHREAD_MUTEX_DEFAULT`: "Attempting to unlock the mutex if it was not locked by the calling thread results in undefined behavior." – sonicwave Aug 08 '12 at 06:07
0

This is quite late, but I forgot to post it for future references. This is how I fixed it:

I upgraded my GCC compiler from version to 4.5.X to version 4.7.X as well as my kernel from 2.6.X to 3.2.X and fixed some errors regarding the global instanciating of a class and a static member variable by providing an explicit constructor in order to allow a global declaration without initialization. But I think that upgrading the GCC compiler is all what was needed.

Looks like the implementation of the function was not proper. Or there were some errors in the kernel code?

xQuare
  • 1,293
  • 1
  • 12
  • 21
  • While that is always a possibility, it is unlikely. More likely, some subtle change in the code generated by the new compiler has concealed your bug. You should post your amended code for review. – Marcelo Cantos Sep 02 '12 at 06:51
  • I did not change the code of how I handled the threads so I could only repeat myself. As written in my answer, I only changed the class structure of a class that was entirely unrelated to my thread handling but of which I created a global instance. – xQuare Sep 02 '12 at 06:53
  • Then your solution is definitely still broken. The standard form of waiting on a condition variable is as follows: `lock(&state_mutex); while (!ready(&state)) { cv.wait(&state_mutex); } alter(&state); unlock(&state_mutex);`. – Marcelo Cantos Sep 02 '12 at 07:07
  • I think my solution is not bogus? Would you mind to refer to my exact code if you still think that its bogus? It would be much appreciated! – xQuare Sep 02 '12 at 07:11
  • Sorry, I must have missed the while loop. I will say that `do…while` is unusual. It can lead to a thread waiting forever for a condition that will never occur. I don't know if that's the case for your code, though. – Marcelo Cantos Sep 02 '12 at 11:41
  • It will be like that if the threads never return. If the threads never start, then the program will terminate anyway (at the moment) – xQuare Sep 02 '12 at 11:45