0

I have a program that is supposed to execute a number of functions, each one in its own thread. Once one of them is finished (or a timeout is reached), all threads are terminated. As I don't have access to all of the individual functions (they are rather complicated external libraries) this is achieved by signalling the threads and jumping into some Cleanup section, from where the threads then exit. For the most part, this works as intended.

However, sometimes one of the threads can't be joined, i.e. the main thread sends a signal, the signal handler is invoked, execution jumps into the cleanup section, the section is executed, but pthread_join() then blocks in the main thread.

void * worker_thread(void *args){
    //Install sig handler
    //If signal received, goto CLEANUP;
    //Launch function
   CLEANUP:
    //Some cleanup
    puts("leaving thread"); //This is executed
    pthread_exit(NULL);
}
int main(){
    //initialise data etc
    pthread_t worker;
    pthread_create(&worker, Null, worker_thread, args);
    // Some stuff
    if(timeout || done){
        pthread_kill(&worker);
        pthread_join(&worker, NULL); //execution blocks here
    }
    return(0);
}

I have played around with a lot of code around this, but it really seems to come down to this pthread_join(). In case it's useful, strace gives:

[pid 19153] futex(0x7f1178000020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>

[pid 19150] futex(0x7f119119c9d0, FUTEX_WAIT, 19153, NULLstrace: [ Process PID=19154 runs in 64 bit mode. ]

which, as far as i understand it, supports my hypothesis that the join / exit pair is blocking in some way.

In any case, if anyone has encountered something like this before or knows what might be causing it, it would be massively helpful to have some input on that.

edit:

I'm aware that signals are not the ideal way to do anything. However, I've tried a lot of non-signal solutions, and none of them work. I need to be able to terminate execution of some function that may take years to run, and for which I have no control over its construction. As far as I can tell, only signals will do this. (If you are wondering, yes, this leaks all of the memory. It's a price I'm currently willing to pay)

  • How do you signal threads? If you call `pthread_exit` from a signal handler that is not supposed to work, see https://man7.org/linux/man-pages/man7/signal-safety.7.html – Maxim Egorushkin Jul 16 '20 at 12:24
  • Answer one here: https://stackoverflow.com/questions/13687985/pthread-exit-in-signal-handler might help -- pthread_exit is not signal safe. Try to avoid signals in pthreads; layering a mess on a mess doesn't make a tidy. – mevets Jul 16 '20 at 12:24
  • I am not calling 'pthread_exit' from the signal handler, the handler invokes 'siglongjmp' to get back to the start of worker, and from there I jump to CLEANUP, where 'pthread_exit' is invoked. As far as I can tell, that should be safe, no? – Matthias Kaul Jul 16 '20 at 12:28
  • 2
    Why do you need all that? Just `exit(0)` from the main program. You cannot forcefully terminate a thread in any other way. – n. m. could be an AI Jul 16 '20 at 12:29
  • Well, i could just nuke the whole thing, that's true. In fact, that's what i did previouly. However, I would like to avoid that scenario, as there are some things I want to do after terminating the threads. Also, there is a 'sigsetjmp' counterpart to the 'siglongjmp'. It jumps where I want, reliably. I doubt a naked 'siglongjmp' would even compile. – Matthias Kaul Jul 16 '20 at 12:39
  • 1
    So you left out the signal handler code, you left out the fact you use `siglongjmp()` to jump out of the signal handler, and you expect help? Not only that, the code you have posted for `main()` doesn't even represent what's happening - once "some stuff" is done, if either `timeout` or `done` is not set `main()` will call `exit()` and the entire program will finish. – Andrew Henle Jul 16 '20 at 12:40
  • `siglongjmp` is still executed in the context of your signal handler from the perspective of the rest of your application and the kernel. – Maxim Egorushkin Jul 16 '20 at 13:26

0 Answers0