2

I try to start 100 processes at the same time in the following code:

int cnt = 0;

void sig_handler(int signo) {
    pid_t pid;
    int stat;
    pid = wait(&stat);
    cout << "cnt:" << ++cnt << ", pid:" << pid << " signal:" << signo << endl;
}

int main() {
    signal(SIGCHLD, sig_handler);
    for (int i = 0; i < 100; ++i) {
        if (fork() == 0) {
            sleep(1);
            exit(0);
        }
    }
    printf("wait\n");
    while (1);
}

I catch the SIGCHLD signal in sig_handler, the results are different: sometimes all processes return OK; sometimes 1 to 4 processes become zombies.

[vinllen@my-host]$ ./a.out
wait
cnt:1, pid:4383 signal:17
cnt:2, pid:4384 signal:17
cnt:3, pid:4385 signal:17
cnt:4, pid:4386 signal:17
cnt:5, pid:4387 signal:17
…
cnt:94, pid:4476 signal:17
cnt:95, pid:4477 signal:17
cnt:96, pid:4478 signal:17
cnt:97, pid:4479 signal:17
cnt:98, pid:4480 signal:17

[vinllen@my-host ~]$ ps aux | grep a.out
Vinllen       4382 96.2  0.0  13896  1084 pts/8    R+   15:14   0:03 ./a.out
Vinllen       4481  0.0  0.0      0     0 pts/8    Z+   15:14   0:00 [a.out] <defunct>
Vinllen       4482  0.0  0.0      0     0 pts/8    Z+   15:14   0:00 [a.out] <defunct>
Vinllen       4493  0.0  0.0 105300   864 pts/9    S+   15:14   0:00 grep a.out

I guess the reason is more than one processes exit at the same time and trigger something. Could anyone give me the detailed reason and tell me how to solve this problem.

In my understanding, double fork and ignore SIGCHLD are two effective ways to solve this problem. However, how to solve in this code that still calling wait.

vinllen
  • 1,369
  • 2
  • 18
  • 36
  • 1
    `cout` in the signal handler could be the culprit. You're supposed to only call [*async-signal-safe* functions](http://man7.org/linux/man-pages/man7/signal-safety.7.html) in signal handlers. In general, keep your signal handlers as short as possible and communicate with the main code through `volatile sig_atomic_t` variables. –  Aug 02 '17 at 07:34

1 Answers1

4

Signals are not queued. If a SIGCHLD is raised while one is pending (probably while your code is in the write syscall), the program will receive just one notification.

The correct way to handle this is to loop in your handler, until all finished children are reaped:

void sig_handler(int signo) {
    pid_t pid;
    int stat;
    while ((pid = waitpid(-1, &stat, WNOHANG) > 0)
    if (WIFEXITED(stat))
    {
        // Don't actually do this: you should
        // avoid buffered I/O in signal handlers.
        std::cout << "count:" << ++cnt
                  << ", pid:" << pid
                  << " signal:" << signo
                  << std::endl;
    }
}

As mentioned in comments, you should stick to the documented async-signal-safe functions in signal handlers. Buffered I/O (including use of std::cout) can be risky, as the signal handler could be invoked whilst it's manipulating its internal structures. The best way to avoid problems is to limit yourself to communicating with the main code using volatile sig_atomic_t variables.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • I'm convinced this is the correct answer, therefore upvoted, still I'm pretty sure using `cout` in a signal handler is as unsafe as using C `stdio` functions, so here's a *potential* for bugs if the main code is doing some I/O as well... The whole reaping and printing loop should be moved outside the handler. –  Aug 02 '17 at 07:51
  • I have another question based on this, which process call the `sig_handler`, father process? or child process? or a new thread in the father process? – vinllen Aug 02 '17 at 07:55
  • @Felix: `write` is listed in the safe functions list. Whatever else `std::cout` does (maintaining state etc) is subject to the usual reentrancy constraints of the STL implementation. – Toby Speight Aug 02 '17 at 07:57
  • @vinllen - the signal handler runs in the context of (any existing thread in) the parent process. It can't run in the child, as the child doesn't exist any more (its memory map has probably been reclaimed already). – Toby Speight Aug 02 '17 at 07:58
  • @TobySpeight So when there is no existing thread except main thread in the father process, it'll start a new one to run the handler, right? – vinllen Aug 02 '17 at 08:01
  • 2
    @vinllen no, signal handlers have nothing to do with threads. If there's more than one thread (which isn't the case in your code), it's unspecified *which* thread will run the handler. A signal handler *interrupts* the main code. –  Aug 02 '17 at 08:03
  • 2
    @vinllen, if there's a single thread, as in this code, that's the thread that will be interrupted to run the signal handler. It will resume when the handler returns. Unix signals pre-date threads and so don't assume that threads are available. – Toby Speight Aug 02 '17 at 08:03