I solved this by extending ExecuteWatchdog; ie:
static class DestroyDescendantsExecutorWatchdog extends ExecuteWatchdog {
private Process process = null;
/**
* Creates a new watchdog with a given timeout.
*
* @param timeout the timeout for the process in milliseconds. It must be
* greater than 0 or 'INFINITE_TIMEOUT'
*/
public DestroyDescendantsExecutorWatchdog(long timeout) {
super(timeout);
}
/**
* Overridden to capture and store the process to monitor.
* @param processToMonitor
* the process to monitor. It cannot be {@code null}
*/
@Override
public synchronized void start(Process processToMonitor) {
super.start(processToMonitor);
process = processToMonitor;
}
/**
* Overridden to clean up the process to monitor state as we stop monitoring.
*/
@Override
public synchronized void stop() {
super.stop();
process = null;
}
/**
* Overrides the default behavior to collect the descendants and kill them as well.
* @param w the watchdog that timed out.
*/
@Override
public synchronized void timeoutOccured(Watchdog w) {
// if the process is defined
if (process != null) {
boolean isProcessTerminated = true;
try {
// We must check if the process was not stopped before being here
process.exitValue();
} catch (final IllegalThreadStateException itse) {
// the process is not terminated, if this is really
// a timeout and not a manual stop then destroy it.
if (isWatching()) {
isProcessTerminated = false;
}
}
// if we haven't already started terminating the process
if (!isProcessTerminated) {
// get all the descendants before you destroy the root process
Stream<ProcessHandle> descendants = process.toHandle().descendants();
// now go destroy the root process
super.timeoutOccured(w);
// follow up by destroying the descendants as well
descendants.forEach(descendant -> {
try {
descendant.destroy();
} catch (Exception e) {
log.warn("pid={};info={}; Could not destroy descendant", descendant.pid(), descendant.info(), e);
}
});
// no longer watching this process as it's destroyed
process = null;
}
}
}
}
The real magic is destroying the descendants along with the root process.
I have a unit test showing that basically does this:
bash "script.sh"
where script.sh is just a "sleep 5"
Before using ExecuteWatchdog, the ExecuteWatchdog would timeout the process with an exit code of 143, but only after 5 seconds. After replacing ExecuteWatchdog with DestroyDescendantsExecutorWatchDog with a 5ms timeout, the unit test almost immediately exits with the expected return code of 143.