2

Edit: I should first clarify, when waitpid does not work, it does not work for all processes. As suggested I printed out the return value of waitpid and received interesting results. Firstly, during the unsuccessful runs, waitpid() returns 0 even when WIFEXITED(stats) returns 1. How could the child process have no change in status but return completed?

Secondly, I used a dummy program that prints out a string argument every 1 second for a specified number of times. (This is how I tracked if a program completed). I noticed that during successful runs, the waitpid value was not printed out during context switching, but after all the processes finished running.

like this: (here assuming each prog takes 2 quotas to complete) "prog1 run" "prog2 run" "prog3 run" "prog1 run" "prog2 run" "prog3 run" waitpid: 0 waitpid: 0 waitpid: 0 ...

on the other hand, an unsuccessful run gave me this: "prog1 run" waitpid: 0 program termination detected "prog2 run" waitpid: 0 program termination detected "prog3 run" waitpid: 0 program termination detected

TLDR: is it possible for waitpid(child_PID, stat, WNOHANG) to give a different WIFEXITED(stat) in different runnings of the same program?

I am coding a round robin scheduler. The parent process forks n child processes, which each run a process in the scheduler. Using signals SIGCONT and SIGSTOP, as well as the usleep() function, the parent is able to allocate a specified time quotas for each of the child processes to run sequentially in a cycle. At the end of each quota, the parent checks to see if any process has completed. It does so using the waitpid(child_PID, stat, WNOHANG); and then WIFEXITED(stat). If the process has completed, the parent will not allocate any more time quotas for that process in subsequent cycles.

However, I noticed that in every other time I run the code, WIFEXITED(stat) gives me a 1 after the first cycle of quotas, even thought I have ensured that every process should run much longer than said quota. I know for a fact the programs should not have been completed, because my test programs involve printing a specified number of lines before they exit. Strangest of all is the WIFEXITED gives me the wrong results exactly every OTHER run, and on the first cycle.

I have included the code in case anyone is patient enough to read it. Hopefully, reading the code is not necessary to understand the problem. For those kind enough to read it, thank you this means a lot, and perhaps you might know why my program does not terminate? When t runs correctly, it schedules all the processes correctly and runs them until all of them terminates, but does not terminate itself.

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <stdbool.h>

int main(int argc, char *argv[]) {
  int tick_interval = 10000;
  char opt;
  while ((opt = getopt(argc, argv, "t:")) != -1) {
    switch (opt) {
    case 't':
      tick_interval = atoi(optarg);
      break;
    default:
      goto usage;
    }
  }

  if (optind >= argc) {
    goto usage;
  }

  char *filepath = argv[optind];//filepath to textfile containing name of programs and arguments.
  int n;
  FILE * fp;
  char * line = NULL;
  size_t len = 0;
  ssize_t read;

  printf("parent PID: %d\n", getpid());
  fp = fopen(filepath, "r");
  if (fp == NULL)
      exit(EXIT_FAILURE);
  int PID;
  int *prog_tracker = malloc(0);
  int line_counter = 0;
  int word_counter;
  int word_length;
  char ***lines = malloc(0);
  while ((read = getline(&line, &len, fp)) != -1) {
      //printf("round %d\n", line_counter);
      word_counter = 0;
      word_length = 0;
      lines = realloc(lines, (++line_counter) * sizeof(char**));
      lines[line_counter - 1] = malloc(0);
      int char_counter;
      bool is_new = 1;
      for (char_counter = 0; char_counter < read; char_counter ++) {
          if (is_new) {
              is_new = 0;
              lines[line_counter - 1] = realloc(lines[line_counter - 1], ++word_counter * sizeof(char*));
              lines[line_counter - 1][word_counter - 1] = malloc(0);
          }
          lines[line_counter - 1][word_counter - 1] = realloc(lines[line_counter - 1][word_counter - 1], ++word_length * sizeof(char));
          if (line[char_counter] == ' '||line[char_counter] == '\0' || line[char_counter] == '\n' || line[char_counter] == EOF) {
              is_new = 1;
              lines[line_counter - 1][word_counter - 1][word_length - 1] = '\0';
              word_length = 0;
          } else {
              lines[line_counter - 1][word_counter - 1][word_length - 1] = line[char_counter];
          }
      }
      //first line states number of cores to be used. To be implemented. Ignored for now.
      if (line_counter != 1) {
          PID = fork();
          if (PID != 0) {
              printf("PID: %d created at: %d\n", PID, line_counter);
              kill(PID, SIGSTOP);
              prog_tracker = realloc(prog_tracker, (line_counter - 1)  * sizeof(int));
              prog_tracker[line_counter - 2] = PID;
          } else {
              char *arguments[word_counter + 1];
              int counter;
              for (counter = 0; counter < word_counter; counter ++) {
                  arguments[counter] = lines[line_counter - 1][counter];
              }
              arguments[word_counter] = NULL;
              execv(arguments[0], arguments);//child processes implement processes in file.
              break;
          }
      }
  }
  free(lines);
  fclose(fp);
  if (line)
      free(line);

  if (PID != 0) {
      printf("parent running %d\n", getpid());
      int proc_num = 0;
      int prog_num = line_counter - 1;
      printf("prog_num: %d\n", prog_num);
      while (prog_num != 0) { //The while loop should break when all programs have finished, but it does not.
          kill(prog_tracker[proc_num], SIGCONT);
          usleep(tick_interval * 1000);
          kill(prog_tracker[proc_num], SIGSTOP);
          int stat;
           printf("status: %d", waitpid(prog_tracker[proc_num], &stat, WNOHANG)); //I now print out the return of waitpid.
          printf("%d\n", WIFEXITED(stat));
          if (WIFEXITED(stat)) {
              //printf("%d\n", WIFEXITED(stat));
              printf("program termination detected\n");
              prog_tracker[proc_num] = 0;
              prog_num -= 1;
              printf("processes left %d\n", prog_num);
          }
          proc_num = (++proc_num) % (line_counter - 1);
          while(prog_tracker[proc_num] == 0) {
              proc_num = (++proc_num) % (line_counter - 1);
          }
       }
       printf("All programs ended.");//This never gets printed!
  }
  return 0;
  • 2
    You should check `waitpid()`'s return value to make sure it was successful before trying to use `stat`. – Shawn Feb 25 '20 at 16:26
  • 1
    With your relatively big program it is difficult to see possible errors. You should simplify your code to create a [mre]. I suggest to replace the processing of command line arguments and input file with fixed values. Add debug output to see to which PID the result of `waitpid` belongs. Check the return value of all functions, maybe a function reports an error. – Bodo Feb 25 '20 at 16:27
  • the posted code does not compile! amongst other things, it is missing a final `}` – user3629249 Feb 25 '20 at 20:21
  • The posted code causes the compiles to output LOTS of warnings, including this critical one: *untitled1.c:108:20: warning: operation on ‘proc_num’ may be undefined [-Wsequence-point]* and again on line 110 – user3629249 Feb 25 '20 at 20:24
  • the posted code is missing the statement: `#include ` for the `waitpid()` function – user3629249 Feb 25 '20 at 20:26
  • 1
    OT: suggest, when compiling, to always enable the warnings, then fix those warnings. ( for `gcc`, at a minimum use: `-Wall -Wextra -Wconversion -pedantic -std=gnu11` ) Note: other compilers use different options to produce the same results – user3629249 Feb 25 '20 at 20:28
  • regarding: `if (PID != 0) { printf("parent running %d\n", getpid());` and `if (PID != 0) { printf("PID: %d created at: %d\n", PID, line_counter);` The function: `fork()` has three(3) kinds of returns: <0 indicates an error occurred. ==0 indicates in child process >0 indicates in parent process. The code should be checking for all three conditions, not assuming that everything worked correctly – user3629249 Feb 25 '20 at 20:32
  • OT: regarding: `if (fp == NULL) exit(EXIT_FAILURE);` should always tell the user what went wrong. Suggest, before calling `exit()` call: `perror( "fopen failed");` so both that error message and the text reason the system thinks the error occurred are output to `stderr`. – user3629249 Feb 25 '20 at 20:37
  • regarding: `printf("All programs ended.");` this will be placed in the `stdout` buffer and (at best) will be output to the terminal after the program exits. Suggest: `printf( "%s\n", "All programs ended.");` Notice the addition of the format string and the trailing `\n` at the end of the format string. The `'\n' will force the `stdout` buffer to be immediately output to the terminal – user3629249 Feb 25 '20 at 20:41
  • regarding: `int *prog_tracker = malloc(0);` This invokes undefined behavior. Suggest: `int *prog_tracker = NULL;` – user3629249 Feb 25 '20 at 20:45
  • regarding: `char ***lines = malloc(0);` Please read my prior comment and [three star programmer](https://wiki.c2.com/?ThreeStarProgrammer) – user3629249 Feb 25 '20 at 20:48
  • regarding: `arguments[word_counter] = NULL; execv(arguments[0], arguments);//child processes implement processes in file. break;` a call to any of the `exec*` functions can fail. Therefore, the call should be followed by: `perror( "execv failed" ); exit( EXIT_FAILURE );` this results in: 1) the user is informed of the failure and 2) the child process does not continue to execute into the parent process code – user3629249 Feb 25 '20 at 20:55
  • OT: regarding: `int counter; for (counter = 0; counter < word_counter; counter ++)` it is always best to limit the scope of local variables. so these statements would be better written as: `for ( int counter = 0; counter < word_counter; counter ++ ) – user3629249 Feb 25 '20 at 20:58
  • OT: regarding: `lines = realloc(lines, (++line_counter) * sizeof(char**));` the function: `realloc()` can fail. When it fails, the variable `lines` will be set to NULL. Then the code will crash when ever the pointer `lines` is dereferenced. Therefore, suggest: `char *** temp = realloc(lines, (++line_counter) * sizeof(char**));' if( ! temp ) { perror( "realloc failed" ); // cleanup, then exit( EXIT_FAILURE ); } // implied else, realloc successful. lines = temp;` – user3629249 Feb 25 '20 at 21:06
  • As Shawn suggested, I printed out the values of waitpid(), and realized that it printed out 0 even when WIFEXITED returned a value of 1. strange. – Marine Biologist Kujo Feb 26 '20 at 00:55
  • Thank you very much - user3629249 for your helpful input. However, I noticed that waitpid() is still recognised as a function despite not including . – Marine Biologist Kujo Feb 26 '20 at 01:06
  • when I compiled the posted code, under ubuntu linux, with the `gcc` compiler, it outputs a message about the function: `waitpid()` not being prototyped, ( therefore, all parameters AND the returned value are assumed to be type `int`. – user3629249 Feb 26 '20 at 18:07

2 Answers2

2

this kind of code:

        PID = fork();
        if (PID != 0) 
        {   << then running parent process or error occurred
            printf("PID: %d created at: %d\n", PID, line_counter);
            kill(PID, SIGSTOP);
            prog_tracker = realloc(prog_tracker, (line_counter - 1)  * sizeof(int));
            prog_tracker[line_counter - 2] = PID;
        }

if the call to fork() was successful, is killing the child process. Probably not what you want.

user3629249
  • 16,402
  • 1
  • 16
  • 17
  • Hi! Thanks for the reply! May I know why it s the child process? Couldnt the parents continue the child process later with SIGCONT? – Marine Biologist Kujo Feb 26 '20 at 00:41
  • when `fork()` returns, there are three possible results. 1) <0 indicates an error occurred. 2) ==0 indicates the child is running 3) >0 is the PID of the child (and the parent is executing.) – user3629249 Feb 26 '20 at 18:02
1

waitpid() can and most likely will give inconsistent results in the example. If you would get its status variable initialized properly ("int stat = 0;") like you almost always should, you would find out why right away as the code would not work as expected at all. This is but a single example why at least somewhat bug-resistant coding style is a must for a C programmer!

The reason for the behaviour is that if the requested process is still running then waitpid() just returns the value of 0 and does not even touch your "stat" variable so it remains uninitialized. You can probably do some dirty hack to get around the issue for example by initializing "stat" each time with something like 0xFF but the proper way to go is to check the return value of waitpid(... WNOHANG) and only process the rest if the return value is not 0. In most cases this is actually the only thing you need to check and "WIFEXITED(stat)" is not even needed unless you are really interested in how exacltly and with what return value your child process has terminated.

mrKirushko
  • 60
  • 1
  • 6