0

I am writing a module for a toolkit which need to execute some sub processes and read their output. However, the main program that uses the toolkit may also spawn some sub processes and set up a signal handler for SIGCHLD which calls wait(NULL) to get rid of zombie processes. As a result, if the subprocess I create exit inside my waitpid(), the child process is handled before the signal handler is called and therefore the wait() in the signal handler will wait for the next process to end (which could take for ever). This behavior is described in the man page of waitpid (See grantee 2) since the linux implementation doesn't seem to allow the wait() family to handle SIGCHLD. I have tried popen() and posix_spawn() and both of them have the same problem. I have also tried to use double fork() so that the direct child exist immediately but I still cannot garentee that waitpid() is called after SIGCHLD is recieved.

My question is, if other part of the program sets up a signal handler which calls wait() (maybe it should rather call waidpid but that is not sth I can control), is there a way to safely execute child processes without overwriting the SIGCHLD handler (since it might do sth useful in some programs) or any zombie processes.

A small program which shows the problem is here (Noted that the main program only exit after the long run child exit, instead of the short one which is what it is directly waiting for with waitpid()):

#include <signal.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

static void
signalHandler(int sig)
{
    printf("%s: %d\n", __func__, sig);
    int status;
    int ret = waitpid(-1, &status, 0);
    printf("%s, ret: %d, status: %d\n", __func__, ret, status);
}

int
main()
{
    struct sigaction sig_act;
    memset(&sig_act, 0, sizeof(sig_act));
    sig_act.sa_handler = signalHandler;
    sigaction(SIGCHLD, &sig_act, NULL);

    if (!fork()) {
        sleep(20);
        printf("%s: long run child %d exit.\n", __func__, getpid());
        _exit(0);
    }

    pid_t pid = fork();
    if (!pid) {
        sleep(4);
        printf("%s: %d exit.\n", __func__, getpid());
        _exit(0);
    }
    printf("%s: %d -> %d\n", __func__, getpid(), pid);

    sleep(1);
    printf("%s, start waiting for %d\n", __func__, pid);
    int status;
    int ret = waitpid(pid, &status, 0);
    printf("%s, ret: %d, pid: %d, status: %d\n", __func__, ret, pid, status);

    return 0;
}
yuyichao
  • 768
  • 6
  • 28

1 Answers1

0

If the process is single-threaded, you can block the CHLD signal temporarily (using sigprocmask), fork/waitpid, then unblock again.

Do not forget to unblock the signal in the forked child - although POSIX states the signal mask is undefined when a process starts, most existing programs expect it to be completely unset.

Remember Monica
  • 3,897
  • 1
  • 24
  • 31
  • This does not work for many reasons, apart from the fact that there's no way to know if the main program is single-threaded or not: 1) I've said that I don't want to override signal handler since a process that the main application wants to wait for might finish at any time when I've set up a different signal handler (which might be `SIG_IGN`); 2) if `SIG_CHLD` is ignored, `wait` will return an error and cannot get the exit state of the child process so I'll be better off just set up my own handler instead of ignoring it. – yuyichao Jan 19 '15 at 01:56
  • First, if the program is single-threaded, then it's trivial to know whether it's single threaded - it is, by definition, and there are many ways to ensure that from a library. Second, you confuse signal handlers with signal mask (different things, read the manpage for sigprocmask). Third, waitpid will NOT return with an error until the child is reaped even when CHLD is ignored, so it will work (try it out) - you might not get an exit status reliably, but you can safely waitpid for the child. – Remember Monica Jan 20 '15 at 08:09
  • If you want answers here, please read the answers properly and try things out before making claims of answers being wrong. Also, if you ask the wrong question, be not be surprised if you get the wrong answer for your problem. – Remember Monica Jan 20 '15 at 08:10
  • 1) So how to check if the program is single threaded? I haven't found anything like this. 2) I am indeed confused by masking vs ignoring (which I remember now that I've done it once before as well). 3) I've tried and waitpid DOES return an error. I know it waits for the correct child to exist but as my comment said, the issue is "cannot get the exit state of the child process" (OTOH masking indeed does not have this issue). – yuyichao Jan 21 '15 at 23:23