1

I am writing a library that uses fork() and exec() to run external programs. The code in my library runs as a separate thread. The library thread needs to waitpid() on the forked process to know the exit code of the child process.

Unfortunately if the application using my library register a signal handler for SIGCHLD, the waitpid() calls returns with an error ECHILD.

How do I deal with this as a library with the application having minimal impact? I essentially want the child to remain a zombie and have control over when it's reaped.

Can I insulate myself from what the application decides to do? Can I hijack the signal handler in some way and put it back after my waitpid is done?

user2624119
  • 107
  • 2
  • 11
  • 2
    You could fork your *own* process to handle your own process handling, and have the library communicate with the stand-alone process through pipes or some such [IPC](https://en.wikipedia.org/wiki/Inter-process_communication). Then you could also install your own signal-handlers, like e.g. your own `SIGCHLD` instead of relying only on `waitpid`. – Some programmer dude Sep 20 '18 at 20:30
  • 2
    Seconded. Unix domain datagram sockets are often useful for this. Unix domain sockets allow you to pass file descriptors, for example pipe descriptors to/from the forked grandchild process, back to the parent, using ancillary messages. Unix domain datagram messages are not reordered, and retain message boundaries (so a receive with a sufficiently large buffer will always receive complete messages); these properties make the communication simple and robust. You might not even need a separate thread for this approach; the helper process should suffice. – Nominal Animal Sep 20 '18 at 20:37
  • 4
    This is an age-old problem with the Unix fork/wait model, and I'm not sure there are any simple solutions. – Barmar Sep 20 '18 at 20:44
  • 1
    Yeah seems like a complicated problem indeed. Thanks for the responses ! Since my library is used by a very controlled set of users with known functionality I'm just tempted to say "DON'T register signal handlers". – user2624119 Sep 20 '18 at 20:46
  • 1
    As I understand it, the standard `system()` function suffers from the same problem. – Barmar Sep 20 '18 at 20:57
  • Instead of telling them not to install signal handlers, provide & document a function they should call at the beginning of their SIGCHLD handler to keep your library happy -- and take care that that function be signal-safe. And hope that other libraries the program uses don't do their process management themselves, but leave it to the main program, as your library should've done, too ;-) –  Sep 20 '18 at 22:50

2 Answers2

1

Freeing resources allocated by a library is an application bug. It's the same in principle as if the appliction did something like free(yourlib_create_context(...)), although of course the consequences are less severe. There's no reason to try to support this kind of nonsensical application behavior; just document that it's invalid.

If you want to "shield" against this sort of programming error, mask signals an then call abort if waitpid fails with ECHLD. This will unconditionally terminate the process without the applicating having any chance to catch it.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
1

You can do just as system does:

system() executes a command specified in command by calling /bin/sh -c command, and returns after the command has been completed. During execution of the command, SIGCHLD will be blocked, and SIGINT and SIGQUIT will be ignored.

An outline could be

pid_t pid;
sigset_t block, backup;

sigemptyset(&block);
sigaddset(&block, SIGCHLD);

sigprocmask(SIG_BLOCK, &block, &backup);

if ((pid = fork()) < 0) {
    // error handle
} else if (pid == 0) {    // child
    sigprocmask(SIG_SETMASK, &backup, NULL);

    // ...
} else {
    // waitpid

    sigprocmask(SIG_SETMASK, &backup, NULL);
}

// ...
Paul
  • 1,630
  • 1
  • 16
  • 23