1

I have one application that is creating zombies processes each 1-2 seconds in my cluster. I'm using Process in my app, but only when I receive specific commands which is not the case right now.

String command = "helm install release xxx";
LOGGER.debug("handle Install request : command [{}]", command);
waitForNormalTermination(Runtime.getRuntime().exec(command), INSTALL_TIMEOUT, TimeUnit.SECONDS, name);

private void waitForNormalTermination(Process process, int timeout, TimeUnit unit, String release) throws Exception {
try {
    if (!process.waitFor(timeout, unit)) {
        throw new TimeoutException("Timeout while executing " + process.info().commandLine().orElse(null));
    }

    if (process.exitValue() != 0) {
        String errorStreamOutput = IOUtils.toString(process.getErrorStream(), StandardCharsets.UTF_8);
        if (errorStreamOutput != null && errorStreamOutput.contains("release: not found")) {
            throw new ReleaseNotFoundException(release);
        }

        throw new Exception("Process termination was abnormal, exit value: [" + process.exitValue() + "], command:[" + process.info().commandLine().orElse(null) + "] error returned:[" + errorStreamOutput + "]");
    }
} finally {
   pr.destroy();   // that part was added to simplify the code.. but each process are destroy like that before existing that method
}
}

Here what I did

#1 - add pr.destroy(); in my code
#1b - build and publish the image
#2 - I killed my pod in my cluster.
#3 - my pod was recreated with the new image
#4 - I look into my node were I had zombies (it's the same where my application was).
        I killed the process java that were generating zombie.  I had over 12 000 zombies.. now I'm back at 4200.
#5 - I did :  ps aux | grep 'Z' | wc -l    
       in a loop to see if I have new zombies... and yes.. they are still increasing
       now I have this : root@test-pcl111:~# ps aux | grep 'Z' | wc -l
       4487
  I did this : kubectl logs iep-iep-codec-staging-7596fccd85-jkn68 --follow

in another terminal so see if I have activities...

the zombies are still increasing each 1-2 seconds even when I don't have activity on my side other than few periodics REST calls (polling from others applications). I this point I'm not calling my method that create new Process(..)

There is something that I missed ?

EDIT I created a little script that will print the zombies by applications in your node.

#!/bin/bash
ps -eo ppid,comm | grep "<defunct>" | awk '{print $1}' | sort | uniq -c > /tmp/zombie.file
Files="/tmp/zombie.file"
Lines=$(cat $Files | tr -s ' ' | cut -d ' ' -f2,3)

i=0;

for Line in $Lines
do
   if [[ $i -eq 0 ]]
     then
       echo "Zombies found = $Line"
       i=1
   else
       ps -f $Line
       i=0
   fi
done

echo " "
echo " "

echo "Running docker containers are "

# that line was to grep only our containers from our private repo
#docker ps | grep private-repository

echo " "
echo " "

echo "the PID of those docker containers"
for value in $(docker ps | grep private-repository  | cut -d ' ' -f1); do
  docker inspect --format '{{ .State.Pid }}' $value
done

EDIT

I have the same issue with Containerd. Look like the problem is with the exec probes.

Sebastien Dionne
  • 757
  • 2
  • 19
  • 34

0 Answers0