0

I am seeing unusual signal numbers (for example 50, 80 or 117) from the following code when waiting for a child process to terminate. I am only seeing this from one particular child process, and I have no access to the process source code and it only happens some of the time.

I want to know what these unusual values mean, given NSIG == 32, and where I can find some documentation in the headers or man pages?

Note that this code runs in a loop sending progressively more menacing signals until the child terminates.

int status, signal;

if (waitpid(m_procId, &status, WNOHANG) < 0) {
    LOGERR << "Failed to wait for process " << name() << ": " <<
        strerror(errno) << " (" << errno << ")";
    break;
} else if (WIFEXITED(status)) {
    m_exitCode = WEXITSTATUS(status);
    terminated = true;
    LOGINF << "Process " << name() << " terminated with exit code " << m_exitCode;
} else if (WIFSIGNALED(status)) {
    signal = WTERMSIG(status);    // !!! signal is sometimes 50, 80 or 117 !!!
    terminated = true;
    LOGINF << "Process " << name() << " terminated by signal " << signal;
} else {
    LOGWRN << "Process " << name() << " changed state but did not terminate.  status=0x" <<
        hex << status;
}

This is running under OSX 10.8.4, but I have also seen it in 10.9 GM seed.

EDIT Modifying the code as below makes the code more robust, however sometimes the child process gets orphaned as I guess the loop doesn't do enough to kill the child process.

else if (WIFSIGNALED(status)) {
    signal = WTERMSIG(status);
    if (signal < NSIG) {
        terminated = true;
        LOGINF << "Process " << name() << " terminated by signal " << signal;
    } else {
        LOGWRN << "Process " << name() << " produced unusual signal " << signal
               << "; assuming it's not terminated";
    }
}

Note this code is part of the Process::unload() method of this class.

trojanfoe
  • 120,358
  • 21
  • 212
  • 242

1 Answers1

2

From the OS X manpage for waitpid, when specifing WNOHANG, you should check for a return of 0:

 When the WNOHANG option is specified and no processes wish to report status, wait4() returns a process
 id of 0.

 The waitpid() call is identical to wait4() with an rusage value of zero.  The older wait3() call is the
 same as wait4() with a pid value of -1.

The code posted does not check for this, which suggests to me that the value of status is likely junk (the value of the int is never initialized). This could cause what you are seeing.

EDIT: status is indeed only set when waitpid returns > 0.

MikeGM
  • 1,061
  • 9
  • 14
  • Yeah that is certainly the reason for the "funny" signal values, however can you explain why `waitpid()` would return `0` and yet the child is still alive? I have made the changes to handle return of `0` and the children do terminate, however it takes a few loops. – trojanfoe Oct 15 '13 at 09:15
  • I also used the `WUNTRACED` flag to `waitpid()` and it seems to have solved the issue. I cannot award the bounty for another 4 hours, so if I forget please remind me. Many thanks for your help. – trojanfoe Oct 15 '13 at 09:21
  • Glad it helped! `waitpid` will only report status (and return >0) on a child that is stopped or has already terminated. It sounds like your usage of `WUNTRACED`has dealt with this, though. – MikeGM Oct 15 '13 at 09:49
  • Yes; the child processes communicate with the main process using a text-based protocol. I must be doing something to cause to child to stop for `SIGTTIN` et al, however it's not clear how as I don't close the pipes until after the child has terminated. Oh well, no matter, as long as it's sorted. – trojanfoe Oct 15 '13 at 10:27