(This question might be somewhat related to pthread_exit in signal handler causes segmentation fault) I'm writing a leadlock prevention library, where there is always a checking thread doing graph stuff and checks if there is deadlock, if so then it signals one of the conflicting threads. When that thread catches the signal it releases all mutex(es) it owns and exits. There are multiple resource mutexes (obviously) and one critical region mutex, all calls to acquire, release resource lock and do graph calculations must obtain this lock first. Now there goes the problem. With 2 competing (not counting the checking thread) threads, sometimes the program deadlocks after one thread gets killed. In gdb it's saying the dead thread owns critical region lock but never released it. After adding break point in signal handler and stepping through, it appears that lock belongs to someone else (as expected) right before pthread_exit(), but the ownership magically goes to this thread after pthread_exit()..
The only guess I can think of is the thread to be killed was blocking at pthread_mutex_lock when trying to gain the critical region lock (because it wanted another resource mutex), then the signal came, interrupting the pthread_mutex_lock. Since this call is not signal-proof, something weird happened? Like the signal handler might have returned and that thread got the lock then exited? Idk.. Any insight is appreciated!

- 1
- 1

- 656
- 2
- 10
- 20
-
Who locks the periodic checker against race conditions? Do you somehow acquire a lock against the acquisition of any locks? Timers can execute at any time, and a signal handler can interrupt anything including `pthread_mutex_lock` (even when it's *not* blocking). I don't know what specific measures Linux provides to get around this, but you might mention what you've done. – Potatoswatter Dec 03 '12 at 16:58
-
see if the following links could help you: (1) http://stackoverflow.com/questions/13305422/how-to-kill-the-management-thread-with-c (2) http://stackoverflow.com/questions/13309415/how-to-execute-a-handler-function-before-quit-the-program-when-receiving-kill-si – MOHAMED Dec 03 '12 at 17:06
-
@Potatoswatter not sure if I fully understand your question, checker only acquires the cr lock, checks for cycles in the resource graph, if so then signals someone to die and goes back to sleep for 1s. That's all it does. Specifically in the test case I have 2 threads and 2 resource mutexes, race conditions indeed occur but worse case checker will figure this out after 1s and kills someone. This does work, occasionally.. – fy_iceworld Dec 03 '12 at 17:11
-
I think my answer already covers things pretty well, but the text I boldfaced is particularly important to the issue you seem to be having: no matter what you do, there's no safe way to terminate a thread that's waiting to lock a mutex. – R.. GitHub STOP HELPING ICE Dec 03 '12 at 17:44
1 Answers
pthread_exit
is not async-signal-safe, and thus the only way you can call it from a signal handler is if you ensure that the signal is not interrupting any non-async-signal-safe function.
As a general principle, using signals as a method of communication with threads is usually a really bad idea. You end up mixing two issues that are already difficult enough on their own: thread-safety (proper synchronization between threads) and reentrancy within a single thread.
If your goal with signals is just to instruct a thread to terminate, a better mechanism might be pthread_cancel
. To use this safely, however, the thread that will be cancelled must setup cancellation handlers at the proper points and/or disable cancellation temporarily when it's not safe (with pthread_setcancelstate
). Also, be aware that pthread_mutex_lock
is not a cancellation point. There's no safe way to interrupt a thread that's blocked waiting to obtain a mutex, so if you need interruptability like this, you probably need either a more elaborate synchronization setup with condition variables (condvar waits are cancellable), or you could use semaphores instead of mutexes.
Edit: If you really do need a way to terminate threads waiting for mutexes, you could replace calls to pthread_mutex_lock
with calls to your own function that loops calling pthread_mutex_timedlock
and checking for an exit flag on each timeout.

- 208,859
- 35
- 376
- 711
-
-
1@user1095108: Not if you keep signals masked while calling those functions. But if any of them are blocking, it'll defeat the purpose of the signal – R.. GitHub STOP HELPING ICE May 28 '20 at 22:48
-
Why? The purpose of the signal could be to interrupt said blocking functions. `epoll_wait()`, for example. – user1095108 May 28 '20 at 22:51
-
1@user1095108: If it's an AS-safe function you're interrupting then yes that works fine. The problem is when it's an AS-unsafe function that's blocking. Then you can't `pthread_exit` from the handler. – R.. GitHub STOP HELPING ICE May 28 '20 at 22:59
-
can you give some examples of AS-unsafe functions? Also, a signal can get deferred. So after the AS-unsafe functions exits, it will trigger. – user1095108 May 28 '20 at 23:06
-
1@user1095108: Almost all standard functions are AS-unsafe. The only AS-safe ones are enumerated in https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 – R.. GitHub STOP HELPING ICE May 29 '20 at 02:44