1

I am making a library that needs to spawn multiple processes.

I want to be able to know the set of all descendant processes that were spawned during a test. This is useful for terminating well-behaved daemons at the end of a passed test or for debugging deadlocks/hanging processes by getting the stack trace of any processes present after a failing test.

Since some of this requires spawning daemons (fork, fork, then let parent die), we cannot find all processes by iterating over the process tree.

Currently my approach is:

  1. Register handler using os.register_at_fork
  2. On fork, in child, flock a file and append (pid, process start time) into another file
  3. Then when required, we can get the set of child processes by iterating over the entries in the file and keeping the ones where (pid, process start time) match an existing process

The downsides of this approach are:

  1. Only works with multiprocessing or os.fork - does not work when spawning a new Python process using subprocess or a non-Python process.
  2. Locking around the fork may make things more deterministic during tests than they will be in reality, hiding race conditions.

I am looking for a different way to track child processes that avoids these 2 downsides.

Alternatives I have considered:

  1. Using bcc to register probes of fork/clone - the problem with this is that it requires root, which I think would be kind of annoying for running tests from a contributor point-of-view. Is there something similar that can be done as an unprivileged user just for the current process and descendants?
  2. Using strace (or ptrace) similar to above - the problem with this is the performance impact. Several of the tests are specifically benchmarking startup time and ptrace has a relatively large overhead. Maybe it would be less so if only tracking fork and clone, but it still conflicts with the desire to get the stacks on test timeout.

Can someone suggest an approach to this problem that avoids the pitfalls and downsides of the ones above? I am only interested in Linux right now, and ideally it shouldn't require a kernel later than 4.15.

Chris Hunt
  • 3,840
  • 3
  • 30
  • 46
  • Can you track things by looking for everything that's in the process group of the first process, or do the descendant processes change their pgrp? – Mark Plotnick May 09 '19 at 18:26
  • The descendant processes do change their process group. I tried the ptrace approach and did not like the fact that I could not easily use strace or gdb on the processes under test, so I think I am probably going to make a library with a `__libc_start_main` shim and set `LD_PRELOAD` at the start of testing so all child processes pick it up. I believe all processes I am concerned about are dynamically linked so it should cover my use case. – Chris Hunt May 09 '19 at 21:20

2 Answers2

0

For subprocess.Popen, there's preexec_fn argument for a callable -- you can hack your way through it.

Alternatively, take a look at cgroups (control groups) -- I believe they can handle tricky situations such as daemon creation and so forth.

a small orange
  • 560
  • 2
  • 16
  • Thanks, preexec_fn covers one gap I mentioned above. Everything I'm reading about cgroups seems to indicate that you need to be privileged or do some prior setup to allow unprivileged processes to use it. – Chris Hunt May 07 '19 at 22:49
  • Unfortunately, yes, as they are used to control system resources, such as CPU shares or memory limits for groups of processes. – a small orange May 07 '19 at 22:57
0

Given the constraints from my original post, I used the following approach:

  1. putenv("PID_DIR", <some tempdir>)
  2. For the current process, override fork and clone with versions which will trace the process start time to $PID_DIR/<pid>. The override is done using plthook and applies to all loaded shared objects. dlopen should also be overridden to override the functions on any other dynamically loaded libraries.
  3. Set a library with implementations of __libc_start_main, fork, and clone as LD_PRELOAD.

An initial implementation is available here used like:

import process_tracker; process_tracker.install()

import os

pid1 = os.fork()
pid2 = os.fork()
pid3 = os.fork()

if pid1 and pid2 and pid3:
    print(process_tracker.children())
Chris Hunt
  • 3,840
  • 3
  • 30
  • 46