Linux Threads suspend/resume

Question

I'm writing a code in which I have two threads running in parallel.

1st is the main thread which started the 2nd thread. 2nd thread is just a simple thread executing empty while loop.

Now I want to pause / suspend the execution of 2nd thread by 1st thread who created it. And after some time I want to resume the execution of 2nd thread (by issuing some command or function) from where it was paused / suspended.

score 19 · Answer 1 · answered Nov 15 '12 at 14:29

This question is not about how to use mutexes, but how to suspend a thread.

In Unix specification there is a thread function called pthread_suspend, and another called pthread_resume_np, but for some reason the people who make Linux, FreeBSD, NetBSD and so on have not implemented these functions.

So to understand it, the functions simply are not there. There are workarounds but unfortunately it is just not the same as calling SuspendThread on windows. You have to do all kinds of non-portable stuff to make a thread stop and start using signals.

Stopping and resuming threads is vital for debuggers and garbage collectors. For example, I have seen a version of Wine which is not able to properly implement the "SuspendThread" function. Thus any windows program using it will not work properly.

I thought that it was possible to do it properly using signals based on the fact that JVM uses this technique of signals for the Garbage collector, but I have also just seen some articles online where people are noticing deadlocks and so on with the JVM, sometimes unreproducable.

So to come around to answer the question, you cannot properly suspend and resume threads with Unix unless you have a nice Unix that implements pthread_suspend_np. Otherwise you are stuck with signals.

The big problem with Signals is when you have about five different libraries all linked in to the same program and all trying to use the same signals at the same time. For this reason I believe that you cannot actually use something like ValGrind and for example, the Boehm GC in one program. At least without major coding at the very lowest levels of userspace.

Another answer to this question could be. Do what Linuz Torvalds does to NVidia, flip the finger at him and get him to implement the two most critical parts missing from Linux. First, pthread_suspend, and second, a dirty bit on memory pages so that proper garbage collectors can be implemented. Start a large petition online and keep flipping that finger. Maybe by the time Windows 20 comes out, they will realise that Suspending and resuming threads, and having dirty bits is actually one of the fundamental reasons Windows and Mac are better than Linux, or any Unix that does not implement pthread_suspend and also a dirty bit on virtual pages, like VirtualAlloc does in Windows.

I do not live in hope. Actually for me I spent a number of years planning my future around building stuff for Linux but have abandoned hope as a reliable thing all seems to hinge on the availability of a dirty bit for virtual memory, and for suspending threads cleanly.

Part 1) Suspending a thread in Linux is easy. There is no "danger of deadlocks", it just works. *Resuming the thread* (and leaving the application functional after that) is the hard part. Java and other popular VM languages can get away with it thanks to system of "safepoints" — rather than being abruptly stopped anywhere, the threads are yielding control to GC in predefined locations: inside loops, on entry to or on exit from native methods etc. Additionally, Java defaults to reentrant locks (and javascript, for example, does not have locks at all), so there is not much space for deadlocks. — user1643723, Apr 06 '18 at 06:24
Part 2) Most systems, that "implement" suspending threads, aren't safe from deadlocks, they just push that responsibility to users. "SuspendThread" is not perfectly safe either, it just assumes, that you know, what you are doing. Linux is not to blame for lack of support — in fact most system calls are interruptible/reentrant, which is why signals work. Userspace libraries (most notably libc malloc and thread-local-storage) are the biggest offenders when it comes to reentrancy and cancellation. — user1643723, Apr 06 '18 at 06:30
VxWorks (not a Unix) implements taskSuspend and taskResume. A their implementation might provide some ideas. — Xofo, Dec 20 '18 at 02:12

score 4 · Answer 2 · answered Jul 13 '12 at 10:13

4

As far as I know you can't really just pause some other thread using pthreads. You have to have something in your 2nd thread that checks for times it should be paused using something like a condition variable. This is the standard way to do this sort of thing.

answered Jul 13 '12 at 10:13

Francis Upton IV

19,322
3
53
57

I don't know if you can do it at all, I was just commenting about pthreads. – Francis Upton IV Jul 13 '12 at 10:21
5

+1 ... and adding to that: Even if you _could_, then you _should not_ suspend threads. Always wait on a condition variable or a similar synchronization primitive, never do anything different. Suspending (or worse, killing) threads causes great evil. It is moderately difficult/easy to get a deadlock or unexpected results using synchronization primitives. However, it is a _normal thing_ that happens all the time when suspending (or killing) threads. And it's near impossible to debug... – Damon Jul 13 '12 at 11:01
1

@FrancisUpton Is sleeping with the help of a semaphore variable not pausing? – Sandeep Jul 13 '12 at 11:06
1

@happy2Help Not sure what you mean, but I mean that you can't explicitly pause *another* thread in pthreads explicitly. Of course threads will pause all the time for various reasons. – Francis Upton IV Jul 13 '12 at 11:08
@FrancisUpton perfectly Agree..!! – Sandeep Jul 13 '12 at 11:10
@FrancisUpton I have a question, Can i give the scheduler the process ID and ask the scheduler to pause it? – Sandeep Jul 13 '12 at 11:13
1

Well you kind of _can_ do it, using `pthread_kill(...SIGSTOP)`, though there is no explicit support, this would just do it. But I strongly advise against it. Suspending a thread stops its execution no matter where it is, no matter what it is doing (it might for example hold a lock, or have half-finished data). Blocking on a condvar/semaphore stops execution at a well-known point in time with well-known state, in a controlled manner. No surprises, no unknown side effects. – Damon Jul 13 '12 at 11:13
please see my solution and let me know if there is any logical flaw in it. Thanks – Sandeep Jul 13 '12 at 11:28
+1 for Damon. Suspending/killing etc. of threads is problematic, difficult, awkward to test, dangerous and a general PITA. The best way of suspending/killing threads is to just not do it at all. If developers could get past this miserable concept of continually creating, suspending, resuming, joining, waitingFor, terminating etc, etc then multithreading would become much simpler and actually become useful, high-performing and actual fun instead of some nightmare that should be avoided. Seriously, some web tutorials/textbooks, and their authors, should be burnt at the stake:) – Martin James Jul 14 '12 at 09:23

Shubham · Answer 3 · 2019-06-15T11:07:53.047

I tried suspending and resuming thread using signals, here is my solution. Please compile and link with -pthread.

Signal SIGUSR1 suspends the thread by calling pause() and SIGUSR2 resumes the thread.

From the man page of pause:

pause() causes the calling process (or thread) to sleep until a signal is delivered that either terminates the process or causes the invocation of a signal-catching function.

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <signal.h>

// Since I have only 2 threads so using two variables, 
// array of bools will be more useful for `n` number of threads.
static int is_th1_ready = 0;
static int is_th2_ready = 0;

static void cb_sig(int signal)
{
        switch(signal) {
        case SIGUSR1:
                pause();
                break;
        case SIGUSR2:
                break;
        }
}

static void *thread_job(void *t_id)
{
        int i = 0;
        struct sigaction act;

        pthread_detach(pthread_self());
        sigemptyset(&act.sa_mask);
        act.sa_flags = 0;
        act.sa_handler = cb_sig;

        if (sigaction(SIGUSR1, &act, NULL) == -1) 
                printf("unable to handle siguser1\n");
        if (sigaction(SIGUSR2, &act, NULL) == -1) 
                printf("unable to handle siguser2\n");

        if (t_id == (void *)1)
            is_th1_ready = 1;
        if (t_id == (void *)2)
            is_th2_ready = 1;

        while (1) {
                printf("thread id: %p, counter: %d\n", t_id, i++);
                sleep(1);
        }

        return NULL;
}

int main()
{
        int terminate = 0;
        int user_input;
        pthread_t thread1, thread2;

        pthread_create(&thread1, NULL, thread_job, (void *)1);
        // Spawned thread2 just to make sure it isn't suspended/paused 
        // when thread1 received SIGUSR1/SIGUSR2 signal
        pthread_create(&thread2, NULL, thread_job, (void *)2);

        while (!is_th1_ready && !is_th2_ready);

        while (!terminate) {
                // to test, I am sensing signals depending on input from STDIN
                printf("0: pause thread1, 1: resume thread1, -1: exit\n");
                scanf("%d", &user_input);

                switch(user_input) {
                case -1: 
                        printf("terminating\n");
                        terminate = 1;
                        break;
                case 0:
                        printf("raising SIGUSR1 to thread1\n");
                        pthread_kill(thread1, SIGUSR1);
                        break;
                case 1:
                        printf("raising SIGUSR2 to thread1\n");
                        pthread_kill(thread1, SIGUSR2);
                        break;
                }
        }

        pthread_kill(thread1, SIGKILL);
        pthread_kill(thread2, SIGKILL);

        return 0;
}

This example is problematic for numerous reasons. It is not thread safe: you do not wait until thread1 and thread2 set their signal handlers before calling pthread_kill. If `scanf` reads from actual human bean, typing into graphical terminal, this might not matter. But as soon as someone runs this program with redirected stdin, you may run in situation when sending SIGUSR1 does nothing (because a thread haven't yet installed it's signal handler). More importantly, your program may sometimes deadlock because printf isn't async-signal-safe, and you are calling it from signal handler. — user1643723, Jun 15 '19 at 08:19
...Furthermore, even if you remove `printf` from cb_sig, you will find out, that pausing thread1 sometimes mysteriously pauses thread2 or even locks up your entire program. This is because `printf` isn't async-signal-safe — it takes a number of locks (heap lock, stdio lock), and does not release them until call to `printf` returns. When a signal arrives *before* printf releases it's lock, your entire program effectively can't call `printf`/`scanf`/`malloc` etc.: the locks they use are taken by now suspended thread and will block caller until suspenedee releases them (so never). — user1643723, Jun 15 '19 at 08:23
Thanks for reviewing, `printf` are just there only for debugs, so they can be removed. I'll make sure that threads get ready before performing any actions. @user1643723 in your second comment first and second lines are contradiction each other, you said about removing `printf` in first and in second one you said the problems because of having `printf` in signal handler. Could you elaborate on that. — Shubham, Jun 15 '19 at 10:48
I used `scanf` just to test my code, eventually this should be event driven, may be if some condition arises suspend a thread or resume it. I thought `scanf` is the easiest way to test it. @user1643723, if you have any suggestion for testing the code please let me know. — Shubham, Jun 15 '19 at 10:56
you should learn what "async-signal-safe" means. Once you fully comprehend that, you will see, that in order to successfully pause (and resume!) a thread with signals that thread has to be limited to async-signal safe code only. Hint: async-signal-safety is a property of code as whole, it does not only apply only to signal handlers and system library functions. In effect, in order to write async-signal-safe code you have to use only async-signal-safe library functions, system calls, atomic data structures or data structures on stack. Nothing else will do. — user1643723, Jun 15 '19 at 11:11

score 2 · Answer 4 · answered Jul 13 '12 at 10:36

There is no pthread_suspend(), pthread_resume() kind of APIs in POSIX.
Mostly condition variables can be used to control the execution of other threads.

The condition variable mechanism allows threads to suspend execution and relinquish the processor until some condition is true. A condition variable must always be associated with a mutex to avoid a race condition created by one thread preparing to wait and another thread which may signal the condition before the first thread actually waits on it resulting in a deadlock.

For more info

Pthreads

Linux Tutorial Posix Threads

score 1 · Answer 5 · answered Jul 13 '12 at 10:58

If you can use processes instead, you can send job control signals (SIGSTOP / SIGCONT) to the second process. If you still want to share the memory between those processes, you can use SysV shared memory (shmop, shmget, shmctl...).

Even though I haven't tried it myself, it might be possible to use the lower-level clone() syscall to spawn threads that don't share signals. With that, you might be able to send SIGSTOP and SIGCONT to the other thread.

Groovy · Answer 6 · 2012-07-14T19:38:33.550

For implementing the pause on a thread, you need to make it wait for some event to happen. Waiting on a spin-lock mutex is CPU cycle wasting. IMHO, this method should not be followed as the CPU cycles could have been used up by other processes/threads. Wait on a non-blocking descriptor (pipe, socket or some other). Example code for using pipes for inter-thread communication can be seen here Above solution is useful, if your second thread has more information from multiple sources than just the pause and resume signals. A top-level select/poll/epoll can be used on non-blocking descriptors. You can specify the wait time for select/poll/epoll system calls, and only that much micro-seconds worth of CPU cycles will be wasted. I mention this solution with forward-thinking that your second thread will have more things or events to handle than just getting paused and resumed. Sorry if it is more detailed than what you asked.

Another simpler approach can be to have a shared boolean variable between these threads. Main thread is the writer of the variable, 0 - signifies stop. 1 - signifies resume Second thread only reads the value of the variable. To implement '0' state, use usleep for sime micro-seconds then again check the value. Assuming, few micro-seconds delay is acceptable in your design. To implement '1' - check the value of the variable after doing certain number of operations. Otherwise, you can also implement a signal for moving from '1' to '0' state.

score 0 · Answer 7 · answered Jul 13 '12 at 10:13

0

You can use mutex to do that, pseudo code would be:

While (true) {
    /* pause resume */
    lock(my_lock); /* if this is locked by thread1, thread2 will wait until thread1 */
                   /* unlocks it */
    unlock(my_lock); /* unlock so that next iteration thread2 could lock */

    /* do actual work here */
}

answered Jul 13 '12 at 10:13

aisbaa

9,867
6
33
48

using this solution I have to add this check in the 2nd thread, but what I want is that 1st thread should control the execution of 2nd thread, and 2nd thread just do its own work i.e it should not contain any checks of mutexes etc – ZeeAzmat Jul 13 '12 at 10:21
I had the same idea, even found pthread_kill function tah to my mind should of work as kill function for processes http://stackoverflow.com/questions/11046720/can-i-stoppause-pthread-execution-using-pthread-kill – aisbaa Jul 13 '12 at 11:21

score 0 · Answer 8 · edited Feb 21 '21 at 09:50

Not sure if you will like my answer or not. But you can achieve it this way.

If it is a separate process instead of a thread, I have a solution (This might even work for thread, maybe someone can share your thoughts) using signals.

There is no system currently in place to pause or resume the execution of the processes. But surely you can build one.

Steps I would do if I want it in my project:

Register a signal handler for the second process.
Inside the signal handler, wait for a semaphore.
Whenever you want to pause the other process, just send in a signal
that you registered the other process with. The program will go into sleep state.
When you want to resume the process, you can send a different signal again. Inside that signal handler, you will check if the semaphore is locked or not. If it is locked, you will release the semaphore. So
the process 2 will continue its execution.

If you can implement this, please do share your feedack, if it worked for you or not. Thanks.

This is probably fine, but it depends on what you are doing. For most uses of threading, condition variables are the appropriate means of inter-thread synchronization. If you have some reason why you *really must* pause a thread in the middle of whatever you are doing, this might be reasonable. But the OP did not state such a reason. — Francis Upton IV, Jul 13 '12 at 11:38
Same issue as with sending SIGSTOP (as I've indicated above, works without writing a handler btw). You do not control when it happens or what state the thread is in. If it is holding a lock at that time, you're in trouble. — Damon, Jul 13 '12 at 11:55

score 0 · Answer 9 · answered Sep 05 '17 at 15:24

You can suspend a thread simply by signal

pthread_mutex_t mutex;
static void thread_control_handler(int n, siginfo_t* siginfo, void* sigcontext) {
    // wait time out
    pthread_mutex_lock(&mutex);
    pthread_mutex_unlock(&mutex);
}
// suspend a thread for some time
void thread_suspend(int tid, int time) {
    struct sigaction act;
    struct sigaction oact;
    memset(&act, 0, sizeof(act));
    act.sa_sigaction = thread_control_handler;
    act.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK;
    sigemptyset(&act.sa_mask);
    pthread_mutex_init(&mutex, 0);
    if (!sigaction(SIGURG, &act, &oact)) {
        pthread_mutex_lock(&mutex);
        kill(tid, SIGURG);
        sleep(time);
        pthread_mutex_unlock(&mutex);
    }
}

Linux Threads suspend/resume

9 Answers9

Linked