I have one application that is creating zombies processes each 1-2 seconds in my cluster. I'm using Process in my app, but only when I receive specific commands which is not the case right now.
String command = "helm install release xxx";
LOGGER.debug("handle Install request : command [{}]", command);
waitForNormalTermination(Runtime.getRuntime().exec(command), INSTALL_TIMEOUT, TimeUnit.SECONDS, name);
private void waitForNormalTermination(Process process, int timeout, TimeUnit unit, String release) throws Exception {
try {
if (!process.waitFor(timeout, unit)) {
throw new TimeoutException("Timeout while executing " + process.info().commandLine().orElse(null));
}
if (process.exitValue() != 0) {
String errorStreamOutput = IOUtils.toString(process.getErrorStream(), StandardCharsets.UTF_8);
if (errorStreamOutput != null && errorStreamOutput.contains("release: not found")) {
throw new ReleaseNotFoundException(release);
}
throw new Exception("Process termination was abnormal, exit value: [" + process.exitValue() + "], command:[" + process.info().commandLine().orElse(null) + "] error returned:[" + errorStreamOutput + "]");
}
} finally {
pr.destroy(); // that part was added to simplify the code.. but each process are destroy like that before existing that method
}
}
Here what I did
#1 - add pr.destroy(); in my code
#1b - build and publish the image
#2 - I killed my pod in my cluster.
#3 - my pod was recreated with the new image
#4 - I look into my node were I had zombies (it's the same where my application was).
I killed the process java that were generating zombie. I had over 12 000 zombies.. now I'm back at 4200.
#5 - I did : ps aux | grep 'Z' | wc -l
in a loop to see if I have new zombies... and yes.. they are still increasing
now I have this : root@test-pcl111:~# ps aux | grep 'Z' | wc -l
4487
I did this : kubectl logs iep-iep-codec-staging-7596fccd85-jkn68 --follow
in another terminal so see if I have activities...
the zombies are still increasing each 1-2 seconds even when I don't have activity on my side other than few periodics REST calls (polling from others applications). I this point I'm not calling my method that create new Process(..)
There is something that I missed ?
EDIT I created a little script that will print the zombies by applications in your node.
#!/bin/bash
ps -eo ppid,comm | grep "<defunct>" | awk '{print $1}' | sort | uniq -c > /tmp/zombie.file
Files="/tmp/zombie.file"
Lines=$(cat $Files | tr -s ' ' | cut -d ' ' -f2,3)
i=0;
for Line in $Lines
do
if [[ $i -eq 0 ]]
then
echo "Zombies found = $Line"
i=1
else
ps -f $Line
i=0
fi
done
echo " "
echo " "
echo "Running docker containers are "
# that line was to grep only our containers from our private repo
#docker ps | grep private-repository
echo " "
echo " "
echo "the PID of those docker containers"
for value in $(docker ps | grep private-repository | cut -d ' ' -f1); do
docker inspect --format '{{ .State.Pid }}' $value
done
EDIT
I have the same issue with Containerd. Look like the problem is with the exec probes.