How can waitpid() reap more than one child?

Question

In this example from the CSAPP book chap.8:


\#include "csapp.h"

/* WARNING: This code is buggy! \*/

void handler1(int sig)
{
int olderrno = errno;

    if ((waitpid(-1, NULL, 0)) < 0)
        sio_error("waitpid error");
    Sio_puts("Handler reaped child\n");
    Sleep(1);
    errno = olderrno;

}

int main()
{
int i, n;
char buf[MAXBUF];

    if (signal(SIGCHLD, handler1) == SIG_ERR)
        unix_error("signal error");
    
    /* Parent creates children */
    for (i = 0; i < 3; i++) {
        if (Fork() == 0) {
            printf("Hello from child %d\n", (int)getpid());
            exit(0);
        }
    }
    
    /* Parent waits for terminal input and then processes it */
    if ((n = read(STDIN_FILENO, buf, sizeof(buf))) < 0)
        unix_error("read");
    
    printf("Parent processing input\n");
    while (1)
        ;
    
    exit(0);

}

It generates the following output:

......
Hello from child 14073
Hello from child 14074
Hello from child 14075
Handler reaped child
Handler reaped child //more than one child reaped
......

The if block used for waitpid() is used to generate a mistake that waitpid() is not able to reap all children. While I understand that waitpid() is to be put in a while() loop to ensure reaping all children, what I don't understand is that why only one waitpid() call is made, yet was able to reap more than one children(Note in the output more than one child is reaped by handler)? According to this answer: Why does waitpid in a signal handler need to loop? waitpid() is only able to reap one child.

Thanks!

update: this is irrelevant, but the handler is corrected in the following way(also taken from the CSAPP book):

void handler2(int sig) 
{
    int olderrno = errno;

    while (waitpid(-1, NULL, 0) > 0) {
        Sio_puts("Handler reaped child\n");
    }
    if (errno != ECHILD)
        Sio_error("waitpid error");
    Sleep(1);
    errno = olderrno;
}

Running this code on my linux computer.

rith · Answer 1 · 2022-11-24T22:51:17.383

2

The signal handler you designated runs every time the signal you assigned to it (SIGCHLD in this case) is received. While it is true that waitpid is only executed once per signal receival, the handler still executes it multiple times because it gets called every time a child terminates.

Child n terminates (SIGCHLD), the handler springs into action and uses waitpid to "reap" the just exited child.

Child n+1 terminates and its behaviour follows the same as Child n. This goes on for every child there is.

~~There is no need to loop it as it gets called only when needed in the first place.~~

Edit: As pointed out below, the reason as to why the book later corrects it with the intended loop is because if multiple children send their termination signal at the same time, the handler may only end up getting one of them.

signal(7):

Standard signals do not queue. If multiple instances of a standard signal are generated while that signal is blocked, then only one instance of the signal is marked as pending (and the signal will be delivered just once when it is unblocked).

Looping waitpid assures the reaping of all exited children and not just one of them as is the case right now.

Why is looping solving the issue of multiple signals?

Picture this: you are currently inside the handler, handling a SIGCHLD signal you have received and whilst you are doing that, you receive more signals from other children that have terminated in the meantime. These signals cannot queue up. By constantly looping waitpid, you are making sure that even if the handler itself can't deal with the multiple signals being sent, waitpid still picks them up as it's constantly running, rather than only running when the handler activates, which can or can't work as intended depending on whether signals have been merged or not.

waitpid still exits correctly once there are no more children to reap. It is important to understand that the loop is only there to catch signals that are sent when you are already in the signal handler and not during normal code execution as in that case the signal handler will take care of it as normal.

If you are still in doubt, try reading these two answers to your question.

How to make sure that `waitpid(-1, &stat, WNOHANG)` collect all children processes
Why does waitpid in a signal handler need to loop? (first two paragraphs)

The first one uses flags such as WNOHANG, but this only makes waitpid return immediately instead of waiting, if there is no child process ready to be reaped.

edited Nov 24 '22 at 22:51

answered Nov 24 '22 at 20:46

rith

47
9

In the book, they did use a while loop to correct this:see my update. – sociala Nov 24 '22 at 21:04
the book made a point to use the while loop? – sociala Nov 24 '22 at 21:09
The while loop is always the correct way to go, however this does not impact the outcome of multiple children being reaped or not. The reason they are is as was mentioned in the initial answer. The buggy part has more to do with incorrect scheduling in fact, you could run into a case where only 2 of 3 children are reaped. This last case can be "artificially" mitigated by using `sleep` calls. – rith Nov 24 '22 at 21:17
1

The loop _is_ necessary, because, if two or more children exit at nearly the same time, you may get _one_ SIGCHLD for _both_ of them. This has almost nothing to do with scheduling, it's because signals don't queue (unless you specifically set them up to, and I don't think it's possible to do that for kernel-generated SIGCHLD). – zwol Nov 24 '22 at 21:26
@Rithari I dont understand what you are saying – sociala Nov 24 '22 at 21:35
@zwol correct - I got confused there for a minute. Indeed it is because multiple signals are merged into one. I've amended my answer to respect these specifications. – rith Nov 24 '22 at 21:37
@rith @zwol , thanks for the input. Still not clear: how is ```while``` going to help if multiple signals are sent to handler at the same time? If only one is actually received by handler, while loop is still going to execute once right(since it is inside the handler)? How is adding ```while``` loop going to ensure that multiple signals are received? – sociala Nov 24 '22 at 22:22
@sociala I did my best to try and further explain the reasoning behind looping and included two answers to read from other questions. Check out the updated answer and hopefully this helps you out. – rith Nov 24 '22 at 22:47
1

IIUC: If one child exits during the signal handler, there will be a queued SIGCHLD which gets delivered as soon as the first signal handler returns (with some user-space code doing a [`sigreturn(2)` system call](https://man7.org/linux/man-pages/man2/sigreturn.2.html) to let the kernel know). So there's no race condition even if a child exits after the loop decides to stop looping. But if two children exit during that window, there'd still only be one SIGCHLD, hence the need for a loop. And the design is safe from race conditions because it queues one pending signal. – Peter Cordes Nov 25 '22 at 05:31

How can waitpid() reap more than one child?

1 Answers1