3

While building a shell program I'm facing an issue of recognizing processes states. The description of the issue I'm facing with is that I have a list of child processes and I'm trying to figure out their state using waitpid and WNOHANG. I wish to distinguish between 3 states: TERMINATED, RUNNING and SUSPENDED. (as defined in the code below) I wish to change the processes states to one of these three above, however right now this function makes running processes statuses to be terminated, and this function also doesn't recognize suspended processes. I would like to know what am I doing wrong and how should the function updateProcessList be written to achieve it?

#define TERMINATED  -1
#define RUNNING 1
#define SUSPENDED 0

typedef struct process{
    cmdLine* cmd;                     /* the parsed command line*/
    pid_t pid;                        /* the process id that is running the command*/
    int status;                       /* status of the process: RUNNING/SUSPENDED/TERMINATED */
    struct process *next;             /* next process in chain */
} process;

void updateProcessList(process **process_list) {
    process *p = *process_list;
    int code = 0, status = 0,pidd = 0;
    while (p) {
        pidd = p->pid;
        code = waitpid(pidd, &status, WNOHANG);
        if (code == -1) {            /* child terminated*/
            p->status = TERMINATED;
        } else if(WIFEXITED(status)){
            p->status = TERMINATED;
        }else if(WIFSTOPPED(status)){
            p->status = SUSPENDED;
        }
        p = p->next;
    }
}
Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • 1
    "I would like to know what am I doing wrong". Can you please tell us what specific problems you are encountering? – kaylum May 07 '20 at 23:49
  • 1
    What doesn't work? What happens when you run your program, and what were you expecting to happen? – Nate Eldredge May 07 '20 at 23:49
  • I editted the post to be more specific – Avi Ferdman May 07 '20 at 23:53
  • Are you saying the state becomes `TERMINATED` even when the process is actually running? And how are you stopping the process to test the `SUSPENDED` state? – kaylum May 08 '20 at 00:02
  • 1
    That is, please give us the exact test case - what test was run, what was the exact expected result and what was the exact actual result. – kaylum May 08 '20 at 00:09
  • yes exactly. And I test that using debugging and printings. – Avi Ferdman May 08 '20 at 00:09
  • Where do you set `RUNNING`? Please show the code as a [minimal verifiable example](https://stackoverflow.com/help/minimal-reproducible-example) – kaylum May 08 '20 at 00:10
  • when the process is being created it's status set to be `RUNNING`, from then on it's status stays as it is, until this function is called. This function goal is to change the processes statuses. – Avi Ferdman May 08 '20 at 00:16
  • @AviFerdman Running in the debugger, are you able to tell us which cases sets the `TERMINATED`? Is it the `if (code == -1)` case? – kaylum May 08 '20 at 00:25

1 Answers1

4

From man 2 waitpid:

RETURN VALUE

    waitpid():  on  success, returns the process ID of the child whose state has changed;
    if WNOHANG was specified and one or more child(ren) specified by pid exist, but  have
    not yet changed state, then 0 is returned.  On error, -1 is returned.

You should check the return value for 0... and also fix the rest of the checks.

code = waitpid(ppid, &status, WNOHANG | WUNTRACED | WCONTINUED);

if (code == -1) {
    // Handle error somehow... 
    // This doesn't necessarily mean that the child was terminated!
    // See manual page section "ERRORS".

    if (errno == ECHILD) {
        // Child was already terminated by something else.
        p->status = TERMINATED;
    } else {
        perror("waitpid failed");
    }
} else if (code == 0) {
    // Child still in previous state.
    // Do nothing.
} else if (WIFEXITED(status)) {
    // Child exited.
    p->status = TERMINATED;
} else if (WIFSIGNALED(status)) {
    // Child killed by a signal.
    p->status = TERMINATED;
} else if (WIFSTOPPED(status)) {
    // Child stopped.
    p->status = SUSPENDED;
} else if (WIFCONTINUED(status)) {
    // This branch seems unnecessary, you should already know this
    // since you are the one that should kill(pid, SIGCONT) to make the
    // children continue.
    p->status = RUNNING; 
} else {
    // This should never happen!
    abort();
}

Also, notice:

  1. My addition of WUNTRACED and WCONTINUED in the flags: WIFSTOPPED() cannot happen unless you are tracing the child with ptrace() or you used the WUNTRACED flag, and WIFCONTINUED() cannot happen unless WCONTINUED is used.
  2. The code and ppid variables should be pid_t, not int (the ppid variable also seems unneeded).

In any case, consider adding a signal handler for SIGCHLD and updating the children statuses there. Your program will receive a SIGCHLD for every child that terminates/stops/resuems. It's much simpler and also faster (does not require to continuously call waitpid() on every single child process).

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • I agree with the return value problem and was going to comment on that too. But what isn't clear to me is how that would result in `TERMINATED` being set because presumably the `-1` case would not come into play if the process is still running? – kaylum May 08 '20 at 00:19
  • @kaylum if the return value is `-1`, then the caller did something wrong. It doesn't mean that the child has stopped. If you look at the manual page you have a list of possible errors when `-1` is returned: these errors are either invalid arguments, non-existing child, etc. – Marco Bonelli May 08 '20 at 00:23
  • I understand that and agree that could be happening. But I can't see from the code why `waitpid` would be returning `-1` in the OP's code. It may be, but just not obvious to me. – kaylum May 08 '20 at 00:24
  • @kaylum just because it doesn't seem like it can happen it doesn't mean that it shouldn't be checked. If OP did everything correctly, `-1` will never be returned. If something strange happens (for example if some other part of the code `wait`s for the child and the child gets reaped, or if the `->pid` gets corrupted), then `-1` will be returned. – Marco Bonelli May 08 '20 at 00:26
  • I think you are missing my point. I totally agree it needs to be fixed. What I'm not sure of is whether doing that would be sufficient to fix the OP's problem. That is, do you see any other potential issue that could result in the problem described or is it certainly the `-1` case that is the root cause? Anyway, hopefully fixing that will indeed resolve the problem. – kaylum May 08 '20 at 00:30
  • @kaylum the root cause of OP's problem is the `WIFSTOPPED` case. `waitpid()` does not return `-1`, it returns `0`, but since the first branch is not entered, the second one is erroneously entered because `status` was initialized to `0` (ant it matches `WIFEXITED`). – Marco Bonelli May 08 '20 at 00:35
  • Hmmm, that last comment doesn't make sense to me. Even the [waitpid man](https://linux.die.net/man/2/waitpid) has example code that does exactly that `WIFEXITED` check when `waitpid` returns `0`. – kaylum May 08 '20 at 00:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213378/discussion-between-marco-bonelli-and-kaylum). – Marco Bonelli May 08 '20 at 00:38