I am making a library that needs to spawn multiple processes.
I want to be able to know the set of all descendant processes that were spawned during a test. This is useful for terminating well-behaved daemons at the end of a passed test or for debugging deadlocks/hanging processes by getting the stack trace of any processes present after a failing test.
Since some of this requires spawning daemons (fork, fork, then let parent die), we cannot find all processes by iterating over the process tree.
Currently my approach is:
- Register handler using
os.register_at_fork
- On fork, in child, flock a file and append
(pid, process start time)
into another file - Then when required, we can get the set of child processes by iterating over the entries in the file and keeping the ones where (pid, process start time) match an existing process
The downsides of this approach are:
- Only works with
multiprocessing
oros.fork
- does not work when spawning a new Python process usingsubprocess
or a non-Python process. - Locking around the fork may make things more deterministic during tests than they will be in reality, hiding race conditions.
I am looking for a different way to track child processes that avoids these 2 downsides.
Alternatives I have considered:
- Using bcc to register probes of fork/clone - the problem with this is that it requires root, which I think would be kind of annoying for running tests from a contributor point-of-view. Is there something similar that can be done as an unprivileged user just for the current process and descendants?
- Using strace (or ptrace) similar to above - the problem with this is the performance impact. Several of the tests are specifically benchmarking startup time and ptrace has a relatively large overhead. Maybe it would be less so if only tracking fork and clone, but it still conflicts with the desire to get the stacks on test timeout.
Can someone suggest an approach to this problem that avoids the pitfalls and downsides of the ones above? I am only interested in Linux right now, and ideally it shouldn't require a kernel later than 4.15.