22

In a system running Linux 2.6.35+ my program creates many child processes and monitors them. If a child process dies I do some clean-up and spawn the process again. I use signalfd() to get the SIGCHLD signal in my process. signalfd is used asynchronously using libevent.

When using signal handlers for non-real time signals, while the signal handler is running for a particular signal further occurrence of the same signal has to be blocked to avoid getting into recursive handlers. If multiple signals arrive at that time then kernel invokes the handler only once (when the signal is unblocked).

Is it the same behavior when using signalfd() as well? Since signalfd based handling doesn't have the typical problems associated with the asynchronous execution of the normal signal handlers I was thinking kernel can queue all the further occurrences of SIGCHLD?

Can anyone clarify the Linux behavior in this case ...

alk
  • 69,737
  • 10
  • 105
  • 255
Manohar
  • 3,865
  • 11
  • 41
  • 56

2 Answers2

25

On Linux, multiple children terminating before you read a SIGCHLD with signalfd() will be compressed into a single SIGCHLD. This means that when you read the SIGCHLD signal, you have to clean up after all children that have terminated:

// Do this after you've read() a SIGCHLD from the signalfd file descriptor:
while (1) {
    int status;
    pid_t pid = waitpid(-1, &status, WNOHANG);
    if (pid <= 0) {
        break;
    }
    // something happened with child 'pid', do something about it...
    // Details are in 'status', see waitpid() manpage
}

I should note that I have in fact seen this signal compression when two child processed terminated at the same time. If I did only a single waitpid(), one of the children that terminated was not handled; and the above loop fixed it.

Corresponding documentation:

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Ambroz Bizjak
  • 7,809
  • 1
  • 38
  • 49
  • Thanks.. couple of questions..
    Lets say there are many events in the epoll queue which is not yet drained by my process. In that case are you saying that kernel will queue only one read event for SIGCHLD on the signalfd even if n processes die ? About using the waitpid() in a loop, the problem I have with this approach is you only get the exit status of the child process, but loose other information that you would get from struct signalfd_siginfo when you read from signalfd (or siginfo_t when using sigaction). I guess there is no way to get that ?
    – Manohar Dec 06 '11 at 11:07
  • @Santhosh note that epoll doesn't queue file descriptor events in the literal sense of it; rather, it only reports the state of file descriptors (readability, writability). So when events occur on a file descriptor that make it readable, it doesn't matter how many there are - epoll will just report readability. And the next time you do an epoll_wait(), it will do exactly the same (except if you use edge-triggered epoll - but it still won't report the number of events). About the struct signalfd_siginfo, I believe you're right. But what would you need from there in case of SIGCHLD anyway? – Ambroz Bizjak Dec 06 '11 at 11:24
  • 3
    @Santhosh: also note that signalfd() itself compresses the SIGCHLD signals, not epoll. This means not only will you not get multiple events from epoll, but you will only get a single SIGCHLD signal from the read(); the struct signalfd_siginfo of the others will be lost forever. – Ambroz Bizjak Dec 06 '11 at 11:27
  • 1
    Thanks. I will do waitpid() in a loop as you suggested.. From signalfd_siginfo I was accessing fields ssi_code, ssi_status to get the signal # that was sent to the child process or flags like CLD_STOPPED or CLI_CONTINUED. But looks like I should rather use macros like WIFEXITED, WIFSIGNALED which needs only the status value returned by waitpid. – Manohar Dec 06 '11 at 11:42
  • 2
    @Ambroz, `but you will only get a single SIGCHLD signal from the read(); the struct signalfd_siginfo of the others will be lost forever` -> Is it a bug or by design? Can you point to some document which describes it? I thought `signalfd` were a hassle-free convenient way of handling child processes... – Vi. Jul 13 '13 at 08:55
  • @Vi I can't find any reference to this behavior, but I believe it is by design. Consider that if it was to deliver all signals to you, even multiple occurences of the same one, the kernel would need to keep a buffer of pending signals, which could grow arbitrarily large. Consider than in low memory condition, a SIGCHLD could be ignored, and this would actually prevent you from reaping the child that died, releasing its memory. (though theoretically, the kernel could preallocate the memory needed to buffer the SIGCHLD to you as the process is created...) – Ambroz Bizjak Jul 13 '13 at 12:56
  • @Vi I've peeked into the kernel source a bit and found this: https://github.com/torvalds/linux/blob/master/kernel/signal.c#L1062 and https://github.com/torvalds/linux/blob/master/kernel/signal.c#L898 . It appears all non-realtime signals are not queued, and the siginfo of these could be lost. – Ambroz Bizjak Jul 13 '13 at 13:47
  • 1
    Thanks. Maybe it should be documented everywhere (signalfd, kill, sigwaitinfo, signal(7))? – Vi. Jul 14 '13 at 20:03
  • 1
    @Vi.: Please see http://man7.org/linux/man-pages/man7/signal.7.html: "*By contrast, if multiple instances of a standard signal are delivered while that signal is currently blocked, then only one instance is queued*" – alk Jun 26 '16 at 12:05
  • 1
    The documentation to `sigwait()` (http://man7.org/linux/man-pages/man3/sigwait.3p.html) points in the same direction: "*If prior to the call to sigwait() there are multiple pending instances of a single signal number, it is implementation-defined whether upon successful return there are any remaining pending signals for that signal number.*" – alk Jun 26 '16 at 12:18
0

Actually the hassle-free way would be the waitfd functionally that would allow you to add a specific pid to poll()/epoll(). Unfortunately, it wasn't accepted to Linux years ago when it was proposed.

Pavel Šimerda
  • 5,783
  • 1
  • 31
  • 31
  • 2
    I really meant to add it as a comment not an answer, sorry. – Pavel Šimerda Oct 23 '13 at 14:46
  • 2
    That functionality was implemented in Linux 5.2 with the `CLONE_PIDFD` flag to `clone` and was extended in Linux 5.3 with the `pidfd_open` syscall. Both return a new file descriptor that can be selected for readability with `select`, `poll`, or `epoll_wait` and waited upon with `waitid` with `P_PIDFD` to retrieve the exit status. This has the advantage that you can safely reap all children without using signals at all. – Matt Whitlock Jun 16 '20 at 21:26