0

How can WTERMSIG() evaluate to a value greater than 63, such as 67 or 123?

Here's how it happens. I'm using popen() to run unbuffer which in turn runs a shell script. The next block of code is a while() loop that reads from the file descriptor returned by popen() until it receives NULL. This is followed by pclose().

On rare occasions, pclose() will return a value != -1 and set WIFSIGNALED() to true. The bizarre part is that WTERMSIG() then evaluates to a value greater than 63, such as 67 or 123. What could cause this?

The shell script is completing all of its tasks, so it's not getting interrupted partway through. I could understand 1–31 (standard signals) or 32–63 (extended real-time signals). Attempting to generate signals 67 or 123 on our platform (Timesys Linux) results in an error.

Here are some of the relavant code:

FILE* fp = popen(command.c_str(), "r");
if (fp == NULL) {
    // Handle error
}

while (fgets(line, sizeof(line), fp) != NULL) {
    // Do stuff with output of child process
}

pclose_status = pclose(fp);
pclose_errno = errno;

if (pclose_status != -1 && WIFEXITED(pclose_status)) {
    // Happy ending, occurs 99% of the time
    return (WEXITSTATUS(pclose_status)));
}

// Check if pclose returned an error. If so, log the result.
if (pclose_status == -1) {
    // Log an error and pclose_errno
}
else {
    if (WIFSIGNALED(pclose_status)) {
        // This happens maybe 1% of the time, but the value 
        // printed is > 63.
        Syslog::error("Child process terminated by signal %d", 
                       WTERMSIG(pclose_status));
    }
    if (WIFSTOPPED(pclose_status)) {
        // Hasn't happened.
        Syslog::error("Child process stopped by delivery of signal %d",
                       WSTOPSIG(pclose_status));
    }
    if (WIFCONTINUED(pclose_status)) {
        // Hasn't happened.
        Syslog::error("Child process resumed by delivery of SIGCONT");
    }
}
  • Can you edit your post and add any relevant code lines around the call to `WTERMSIG`? – eepp Jun 25 '12 at 21:13
  • Code added. Thanks for looking! – Andreas Yankopolus Jun 26 '12 at 01:26
  • I seriously have no clue. Signal numbers usually end at 63 for RT ones. What's the underlying architecture? Are you familiar with system tracing? – eepp Jun 26 '12 at 04:30
  • Glad to hear that I'm not missing something obvious. The underlying architecture is a Core2 Duo single-board computer manufactured by Kontron. I'm not familiar with system tracing but intend to become an strace expert today. Is that what you had in mind? – Andreas Yankopolus Jun 26 '12 at 12:41
  • Not exactly; I had something more in-depth in mind, something like [LTTng](http://lttng.org/). Try installing this, enable all kernel events and then look at the results using a viewer. Perhaps you will discover a bad pattern there, around your call to `pclose`. LTTng does not only trace the system calls: it has lots of trace points within important kernel routines, like signals delivery. – eepp Jun 26 '12 at 14:40
  • Turns out that Expect (which runs behind the scenes in unbuffer) will add 128 to received signals when generating a return code. But I think in my case the problem is due to a subtle race condition with another thread… – Andreas Yankopolus Jun 26 '12 at 20:22

0 Answers0