How do you kill a process and its children on a timeout in Go code?

Question

I have a situation where I need to kill a process after some time. I start the process and then:

case <-time.After(timeout):
        if err := cmd.Process.Kill(); err != nil {
            return 0, fmt.Errorf("Failed to kill process: %v", err)
        }

kills the process. But it only kills the parent process not the 5-10 child processes that main process starts. I also tried creating a process group and then doing:

syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)

to kill main and grandchild processes, but not working. Is there any other way I can kill the processes.

Side note: it's bad practice to wrap an underlying error by discarding everything but its string representation. E.g. your returned error gives the caller no sane way of testing for a `syscall.Error` value (one of the possible reasons the kill may fail). If you really need to wrap errors, see `os.PathError` and `strconv.NumError` as examples of a better way. — Dave C, Oct 03 '15 at 15:30
This can be solved by creating a process group for command and then killing the process group instead of the process itself. — Varun, Oct 05 '15 at 18:43

score 4 · Answer 1 · answered Oct 06 '15 at 02:13

I think this is what you need:

cmd := exec.Command(command, arguments...)

// This sets up a process group which we kill later.
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}

if err := cmd.Start(); err != nil {
    return err
}

// buffered chan is important so the goroutine does't
// get blocked and stick around if the function returns
// after the timeout
done := make(chan error, 1)

go func() {
    done <- cmd.Wait()
}()

select {
case err := <-done:
    // this will be nil if no error
    return err
case <-time.After(time.Second):
    // We created a process group above which we kill here.
    pgid, err := syscall.Getpgid(cmd.Process.Pid)
    if err != nil {
        return err
    }
    // note the minus sign
    if err := syscall.Kill(-pgid, 15); err != nil {
        return err
    }
    return fmt.Errorf("Timeout")
}

Minor note, the `Setpgid` field and the `syscall.Getpgid` call are OS specific. In particular they don't exist on Microsoft Windows so you may want to put such things in a file with [`// +build !windows`](https://golang.org/pkg/go/build/#hdr-Build_Constraints) near the top. — Dave C, Oct 06 '15 at 16:18

score 0 · Answer 2 · answered Jun 02 '20 at 03:36

It is not clear whether you have control of those child processes. If so, you could consider using the following Linux feature (you also don't say whether it's specific to an OS).

This line of code asks the kernel to send a SIGHUP to the children when the parent's die. That way your Go process can just kill the parent and it will automatically kill all the children. Not only that, it never fails! The kernel is really good on that one.

prctl(PR_SET_PDEATHSIG, SIGHUP);

Of course, there is a race condition if you do just that. That is, by the time the child calls this prctl() function, the parent may have died already in which case the child needs to exit immediately.

if(getppid() != parent_pid)
{
    exit(1);
}

So the complete code to avoid the race condition is:

// must happen before the fork() call
const pid_t parent_pid = getpid();

const pid_t child_pid = fork();

if(child_pid != 0)
{
    // fork() failed (child_pid == -1) or worked (an actual PID)
    ...
    return;
}

prctl(PR_SET_PDEATHSIG, SIGHUP);

if(getppid() != parent_pid)
{
    exit(1);
}

Note: it is customary to use SIGHUP for this situation. You may want to consider other signals too, especially if the children deal with pipes/sockets (in which case you are likely to ignore SIGHUP!) or need to handle SIGHUP for other reasons.

Now if you do not have any control over the code of the children processes... you could try to kill each one from your Go application by searching all the children, killing them one by one, and then kill the parent process. However, you always have a race condition that you can't avoid unless you can prevent that whole tree of children from creating new processes. If you can do that, then it's just a matter of registering the PID of all those children and killing them one by one.

Of course, if you can create a group, much better. Like the SIGHUP above, killing all the members of a group is done by the kernel and it won't miss any processes.

How do you kill a process and its children on a timeout in Go code?

2 Answers2

Linked