1

I am trying to write a program in which I am forking a child from a parent, and handling SIGCHLD signals using a handler, in which I use waitpid(). When I execute it, however, I am sometimes getting a return value of 0 from waitpid, along with errno being set to EINTR. What does that mean?

Here is my SIGCHLD handler:

pid_t pid;
int status;

while((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) > 0)
{
    printf("Handler reaped child %d\n", (int)pid);
    if(WIFEXITED(status))
    {
        deletejob(job_list, pid);
    }
    else if(WIFSIGNALED(status))
    {
        deletejob(job_list, pid);
    }
    else if(WIFSTOPPED(status))
    {
        struct job_t *job = getjobpid(job_list, pid);
        job->state = ST;
    }
}

printf("%d %d\n", pid, errno);
if(errno != ECHILD)
{
    unix_error("waitpid error");
}

return;

Here is the parent function, in which I fork the child:

    pid_t pid;
    sigset_t block_set;
    int file_descriptor;
    if(tok.outfile == NULL)
    {
        file_descriptor = STDOUT_FILENO;
    }
    else
    {
        file_descriptor = open(tok.outfile, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
    }

    sigemptyset(&block_set);
    sigaddset(&block_set, SIGCHLD);
    sigaddset(&block_set, SIGINT);
    sigaddset(&block_set, SIGTSTP);
    sigprocmask(SIG_BLOCK, &block_set, NULL);

    pid = fork();

    if(pid == 0)
    {
        sigprocmask(SIG_UNBLOCK, &block_set, NULL);
        setpgid(0, 0);
        dup2(file_descriptor, 1);

        while(execve(tok.argv[0], tok.argv, environ) < 0)
        {
            exit(0);
        }
    }
    else
    {
        if(bg == 1)
        {
            addjob(job_list, pid, BG, cmdline);
            sigprocmask(SIG_UNBLOCK, &block_set, NULL);
            int jid = pid2jid(pid);
            printf("[%d] (%d) %s\n", jid, pid, cmdline);
        }
        else if(bg == 0)
        {
            addjob(job_list, pid, FG, cmdline);
            sigprocmask(SIG_UNBLOCK, &block_set, NULL);
        }

        if(bg == 0)
        {
            struct job_t *job = getjobpid(job_list, pid);
            while(pid == fgpid(job_list))
            {
                sleep(1);
            }
        }
    }
}

return;
Parag Goel
  • 123
  • 2
  • 14
  • 1
    `waitpid()` returns 0 when there are no child processes ready to be reaped, if you specify `WNOHANG`. Since it's not returning an error, you shouldn't be checking `errno` then. Some other system call must be getting interrupted previously, and setting it then. – Crowman Mar 27 '15 at 23:53
  • OK. Got it. Is there any way to check which system call's interruption is setting errno then? – Parag Goel Mar 27 '15 at 23:55
  • Yes, by checking the return from every single one of your system calls, and checking `errno` if any of them return an error. – Crowman Mar 27 '15 at 23:56
  • Alright. I'll do that. Thanks for the help!! BTW removing the check for errno resolved the stated problem. – Parag Goel Mar 27 '15 at 23:57
  • By the way, you say "I am sometimes getting a return value of 0 from waitpid", but in the absence of errors, `waitpid()` will return 0 every single time your handler gets called, since your `while` loop continues to call it until it does, which is how you know that all the available children have been reaped. – Crowman Mar 28 '15 at 00:00
  • 2
    The `EINTR` could have happened inside some library function, not necessarily in one of your direct system call. In general, you should ignore `errno` unless you've just made a system call that reported an error. – Barmar Mar 28 '15 at 00:05
  • What's likely is that the `SIGCHLD` interrupted some system call inside the stdio library. – Barmar Mar 28 '15 at 00:06
  • @Barmar so the search for the system call which caused the errno to be set to EINTR might be a fruitless search? – Parag Goel Mar 28 '15 at 00:15
  • @ParagGoel: Indeed. Any standard function (except a few specified not to) can clobber `errno` as part of normal, successful operation. The only time `errno` is meaningful is when the return value of the library function that just returned indicates an error. – R.. GitHub STOP HELPING ICE Mar 28 '15 at 00:17
  • It's never fruitless to check if your system calls are returning errors. You should always do that for every system call which can fail. If you really want to know which one is failing, and it might be a library function, `strace` can help you. – Crowman Mar 28 '15 at 00:17
  • @ParagGoel Yes, that's what I'm suggesting. You got a false alarm by looking at it when you shouldn't have. – Barmar Mar 28 '15 at 00:17
  • @PaulGriffiths what if the system call is some library function, and not some invocation in my program? Wouldn't that get a tad bit difficult? (Apologies for the noob questions, this is my first time dealing with a systems application) – Parag Goel Mar 28 '15 at 00:21
  • @ParagGoel: `strace` will show all the system calls your process makes, including those invoked by library functions. The shared library code gets mapped into your process's memory space, so the library function call is still identifiably being made by your particular process, despite the fact it's a shared library. – Crowman Mar 28 '15 at 00:25
  • Hmm.. OK.. I'll look into debugging this now. This was a big help for me! Thanks for this info! – Parag Goel Mar 28 '15 at 00:38
  • @ParagGoel: For the avoidance of doubt, system calls getting interrupted is absolutely routine, and the fact that `errno` might equal `EINTR` is nothing at all to be concerned about. By default, signals interrupt blocking system calls on the general principle that signals are important, and you might want to deal with them, so the system call returns early. If you don't care about the signal, you just restart the system call. `errno` might still be set to `EINTR` after this happens, but that doesn't indicate that there's a problem. – Crowman Mar 28 '15 at 00:41

1 Answers1

4

When called with WNOHANG, waitpid() returns 0 when there are no more children left to reap. Your SIGCHLD handler gets called when a child process exits, so you know there'll always be at least one to reap. But because multiple signals don't get queued, it's possible there might be more than one child process to reap.

So what this while loop does:

while((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) > 0)

is to basically say "call waitpid() to reap the child I know is waiting, and then keep calling it over and over to reap any additional children which may happen to be available, until it returns 0, at which point I know I've reaped all the children that are available."

Returning 0 here is therefore not an error, it's a deliberate device to reap an unknown number of children. waitpid() is not setting errno, in this case, so that EINTR must have been set when some previous system call got interrupted.

Generally speaking, you should not check errno unless a function returns an error, although there are some unusual cases (strtol() is one, where a return of 0 could mean the parsed number was 0, or could mean that there was an error) where the return value does not unambiguously indicate an error. In these cases you can set errno to 0 prior to calling the function, and in the event the return value suggests there might be an error, you can check errno to see if it's been set.

Crowman
  • 25,242
  • 5
  • 48
  • 56
  • So the only situation when return value is -1 is when there are actually no children left in the calling process? I suppose only in this situation errno is set to ECHILD, then. – Parag Goel Mar 28 '15 at 00:18
  • `waitpid()` can return `-1` for a number of reasons, such as passing it a specific pid that does not exist, or passing it invalid options, or if it's interrupted by a signal when you don't specify `WNOHANG`. If you call it in the way you did, it will return `-1` if no children of the specified type exist (which in this case means any child process, since you specified `-1` as the first argument). – Crowman Mar 28 '15 at 00:33