0

In the book "Unix network programming, Volume 1" by Richard Stevens, in the section "Difference between wait vs waitpid", it says waitpid() should be used intead of wait(). I understand the problem described when using wait(). It says, when multiple child processes terminate simultaneously and hence multiple SIGCHLDs are raised, the parent may get delivered only the first of them and the others would be lost since the kernel does not queue signals. Ok, but how does waitpid avoid this problem ?

Below is how the book uses waitpid() in the signal handler:

    while ( (pid = waitpid(-1, &stat, WNOHANG) ) > 0) {
        printf("child %d terminated\n", pid);
    }
Jayanth
  • 115
  • 1
  • 11

1 Answers1

1

The difficulty is that a signal SIGCHLD only tells that at least one child process has exited or changed its state. You don't know how many wait or waitpid calls are required.

According to the documentation, e.g. https://linux.die.net/man/2/waitpid or https://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html, a call

pid_t pid = wait(&status);

is equivalent to

pid_t pid = waitpid(-1, &status, 0);

Your example

while ( (pid = waitpid(-1, &stat, WNOHANG) ) > 0) {
    printf("child %d terminated\n", pid);
}

uses the additional flag WNOHANG, which makes the call non-blocking. This means you can repeatedly call waitpid in a loop until it tells you that it has not found any more process. So you can wait for as many processes as have exited now without knowing their number. After exiting from the loop, the parent process can continue its normal processing.

In contrast to this, wait would block if there is still a running child process that has not exited or changed its state yet. This would happen when you call wait in a similar loop. There is no option to make wait non-blocking in this case. (You could interrupt it by a signal, though.)

So waitpid does not avoid the problem but allows you to handle it without blocking your parent process. It depends on your program if the non-blocking waitpid is useful or required, or if a possibly blocking wait is sufficient.

Bodo
  • 9,287
  • 1
  • 13
  • 29
  • hmm, It's true that a non-blocking version of wait is required in a TCP server. So, for now, it seems "the multiple simultaneous SIGCHLDs" problem applies even to waitpid, unless someone provides us with another explanation for that. – Jayanth Apr 05 '22 at 13:08
  • 1
    @Jayanth This is what meant with the last paragraph. Using `waitpid` does not change the behavior of the signals. You can still get less signals than exited processes, but you can call `waitpid` in a loop without blocking to "wait" for more than process if necessary. – Bodo Apr 05 '22 at 13:17