1

I am working on fixing a bug that makes our CI/CD pipeline fails. During an integration test, we spin up a local database instance. In order to do this, we are using some mariadb wrappers to launch it from a java codebase.

This process can (potentially) take a long time to finish, which will cause our tests to timeout. In this case, we have added a functionality to kill a process if it cannot install within 20 seconds and should try again.

This part seems to be working.

The strange bit comes when trying to destroy the process. It seems to randomly take ~2-3 MINUTES to be unblocked. This is problematic for the same reason that the above problem was problematic.

Upon investigation into the underlying libraries, it seems like we are using ExecuteWatchdog to manage the process. The is a bit of code that is blocking is:

watchDog.destroyProcess();
// this part usually returns nearly instantly

try {
  // this part can take minutes...
  resultHandler.waitFor();
} catch (InterruptedException e) {
  throw handleInterruptedException(e);
}

In addition to this, there is different behavior on Mac/Linux. If I do something like resultHandler.waitFor(1000) // Wait with 1000ms timeout before just exiting, it will work fine on a macbook, but on linux i see an error like: java.io.FileNotFoundException: {{executable}} (Text file busy)

Any ideas on this?

I have done some research and it seems like watchDog.destroyProcess is sending a SIGTERM instead of a SIGKILL. But I do not have any hooks to get the Process object in order to send it the KILL instead.

Thanks.

Anthony
  • 189
  • 1
  • 15

1 Answers1

0

A common cause for blocking when working with processes is that the process is blocked on output, either to stdout or (the more likely to be overlooked) stderr.

In this context, setting up tests on a CI server, you might try setting the output and error output to INHERIT.

Note that this means that you won't be able to read the sub-process output or error stream in your Java code. My assumption is that you aren't trying to do that anyway, and that's why the process hangs. Instead, that output will be redirected to the output of the Java process, and I expect your CI server will log it as part of the build.

erickson
  • 265,237
  • 58
  • 395
  • 493