2

I am watching the processes with htop and I see that child process stays as zombie even though I clean up with waitpid call. Any idea why this might happen?

Thank you very much!

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <signal.h>

void child_signal_handler(int signal) {
  printf("Someone is stabbed me with signal %d\n", signal);
}

int main(int argc, char** argv)
{
  const pid_t child = fork();

  if (child == 0) {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = &child_signal_handler;
    sigaction(SIGTERM, &sa, NULL);
    printf("Child is started in busy loop\n");
    while (true)
      ;
  } else {
    const int mercy_period = 3;
    printf("Parent is angry and gonna kill his child in %d sec\n", mercy_period);
    sleep(mercy_period);
    kill(child, SIGTERM);
    // clean-up zombie child process as it is terminated before we can wait on
    // it
    int status = 0;
    while(waitpid(-1, &status, WNOHANG) > 0);
  }

  return EXIT_SUCCESS;
}
Validus Oculus
  • 2,756
  • 1
  • 25
  • 34
  • 1
    waitpid will return 0 if the only child process didn't terminate and the while breaks immediately. What is the first return value of waitpid you are seeing? – Tony Tannous Aug 12 '20 at 05:52
  • 1
    You are sending `kill (0, SIGTERM)` which sends to every process in the process tree. No guarantee if parent or child is terminated first. Next `WNOHANG` - return immediately if no child has exited. -- you are not waiting on the child to terminate. – David C. Rankin Aug 12 '20 at 05:53
  • Let me make some changes in the code to try it. – Validus Oculus Aug 12 '20 at 05:55
  • `printf` in a signal handler is usually not a good idea. – molbdnilo Aug 12 '20 at 05:57
  • @molbdnilo True, doing any IO and using most of the syscalls are not a good idea. This is just a test code for me to trace it. – Validus Oculus Aug 12 '20 at 05:59
  • @DavidC.Rankin, Thank you guys, the info you shared very useful. It seems another issue was my signal handler was just ignoring the signal so child process was never ended actually. Once I put abort() in the signal handler, it ended the child process so that wait call worked fine. Regarding kill(), I thought it won't return until it kills. Since I have one child only, it should be like a blocking call. – Validus Oculus Aug 12 '20 at 06:12

1 Answers1

2

waitpid glibc implementation comments

If PID is (pid_t) -1, match any process. If the WNOHANG bit is set in OPTIONS, and that child is not already dead, return (pid_t) 0.

The while loop clearly exits immediately as 0 > 0 is false.

Change the else and the signal to SIGKILL

} else {
    const int mercy_period = 3;
    printf("Parent is angry and gonna kill his child in %d sec\n", mercy_period);
    sleep(mercy_period);
    kill(child, SIGKILL);
 
    int status = 0;
    pid_t pid = waitpid(-1, &status, WNOHANG);
    while(!pid) {
        pid = waitpid(-1, &status, WNOHANG);
        printf("%d\n", pid);    
    }
}

After few attempts waitpid will return the pid of the child process. A success.

enter image description here

Tony Tannous
  • 14,154
  • 10
  • 50
  • 86
  • 1
    Interesting, I thought kill command won't return until it kills the process. It seems it is not the case. My signal handler was wrong, I added abort() so that child process indeed dies. – Validus Oculus Aug 12 '20 at 06:15
  • The phenomenon explained very well in https://stackoverflow.com/questions/8679226/does-a-kill-signal-exit-a-process-immediately. Putting here for others to read as well. – Validus Oculus Aug 12 '20 at 06:22