Edit: I should first clarify, when waitpid does not work, it does not work for all processes. As suggested I printed out the return value of waitpid and received interesting results. Firstly, during the unsuccessful runs, waitpid() returns 0 even when WIFEXITED(stats) returns 1. How could the child process have no change in status but return completed?
Secondly, I used a dummy program that prints out a string argument every 1 second for a specified number of times. (This is how I tracked if a program completed). I noticed that during successful runs, the waitpid value was not printed out during context switching, but after all the processes finished running.
like this: (here assuming each prog takes 2 quotas to complete) "prog1 run" "prog2 run" "prog3 run" "prog1 run" "prog2 run" "prog3 run" waitpid: 0 waitpid: 0 waitpid: 0 ...
on the other hand, an unsuccessful run gave me this: "prog1 run" waitpid: 0 program termination detected "prog2 run" waitpid: 0 program termination detected "prog3 run" waitpid: 0 program termination detected
TLDR: is it possible for waitpid(child_PID, stat, WNOHANG) to give a different WIFEXITED(stat) in different runnings of the same program?
I am coding a round robin scheduler. The parent process forks n child processes, which each run a process in the scheduler. Using signals SIGCONT and SIGSTOP, as well as the usleep() function, the parent is able to allocate a specified time quotas for each of the child processes to run sequentially in a cycle. At the end of each quota, the parent checks to see if any process has completed. It does so using the waitpid(child_PID, stat, WNOHANG); and then WIFEXITED(stat). If the process has completed, the parent will not allocate any more time quotas for that process in subsequent cycles.
However, I noticed that in every other time I run the code, WIFEXITED(stat) gives me a 1 after the first cycle of quotas, even thought I have ensured that every process should run much longer than said quota. I know for a fact the programs should not have been completed, because my test programs involve printing a specified number of lines before they exit. Strangest of all is the WIFEXITED gives me the wrong results exactly every OTHER run, and on the first cycle.
I have included the code in case anyone is patient enough to read it. Hopefully, reading the code is not necessary to understand the problem. For those kind enough to read it, thank you this means a lot, and perhaps you might know why my program does not terminate? When t runs correctly, it schedules all the processes correctly and runs them until all of them terminates, but does not terminate itself.
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <stdbool.h>
int main(int argc, char *argv[]) {
int tick_interval = 10000;
char opt;
while ((opt = getopt(argc, argv, "t:")) != -1) {
switch (opt) {
case 't':
tick_interval = atoi(optarg);
break;
default:
goto usage;
}
}
if (optind >= argc) {
goto usage;
}
char *filepath = argv[optind];//filepath to textfile containing name of programs and arguments.
int n;
FILE * fp;
char * line = NULL;
size_t len = 0;
ssize_t read;
printf("parent PID: %d\n", getpid());
fp = fopen(filepath, "r");
if (fp == NULL)
exit(EXIT_FAILURE);
int PID;
int *prog_tracker = malloc(0);
int line_counter = 0;
int word_counter;
int word_length;
char ***lines = malloc(0);
while ((read = getline(&line, &len, fp)) != -1) {
//printf("round %d\n", line_counter);
word_counter = 0;
word_length = 0;
lines = realloc(lines, (++line_counter) * sizeof(char**));
lines[line_counter - 1] = malloc(0);
int char_counter;
bool is_new = 1;
for (char_counter = 0; char_counter < read; char_counter ++) {
if (is_new) {
is_new = 0;
lines[line_counter - 1] = realloc(lines[line_counter - 1], ++word_counter * sizeof(char*));
lines[line_counter - 1][word_counter - 1] = malloc(0);
}
lines[line_counter - 1][word_counter - 1] = realloc(lines[line_counter - 1][word_counter - 1], ++word_length * sizeof(char));
if (line[char_counter] == ' '||line[char_counter] == '\0' || line[char_counter] == '\n' || line[char_counter] == EOF) {
is_new = 1;
lines[line_counter - 1][word_counter - 1][word_length - 1] = '\0';
word_length = 0;
} else {
lines[line_counter - 1][word_counter - 1][word_length - 1] = line[char_counter];
}
}
//first line states number of cores to be used. To be implemented. Ignored for now.
if (line_counter != 1) {
PID = fork();
if (PID != 0) {
printf("PID: %d created at: %d\n", PID, line_counter);
kill(PID, SIGSTOP);
prog_tracker = realloc(prog_tracker, (line_counter - 1) * sizeof(int));
prog_tracker[line_counter - 2] = PID;
} else {
char *arguments[word_counter + 1];
int counter;
for (counter = 0; counter < word_counter; counter ++) {
arguments[counter] = lines[line_counter - 1][counter];
}
arguments[word_counter] = NULL;
execv(arguments[0], arguments);//child processes implement processes in file.
break;
}
}
}
free(lines);
fclose(fp);
if (line)
free(line);
if (PID != 0) {
printf("parent running %d\n", getpid());
int proc_num = 0;
int prog_num = line_counter - 1;
printf("prog_num: %d\n", prog_num);
while (prog_num != 0) { //The while loop should break when all programs have finished, but it does not.
kill(prog_tracker[proc_num], SIGCONT);
usleep(tick_interval * 1000);
kill(prog_tracker[proc_num], SIGSTOP);
int stat;
printf("status: %d", waitpid(prog_tracker[proc_num], &stat, WNOHANG)); //I now print out the return of waitpid.
printf("%d\n", WIFEXITED(stat));
if (WIFEXITED(stat)) {
//printf("%d\n", WIFEXITED(stat));
printf("program termination detected\n");
prog_tracker[proc_num] = 0;
prog_num -= 1;
printf("processes left %d\n", prog_num);
}
proc_num = (++proc_num) % (line_counter - 1);
while(prog_tracker[proc_num] == 0) {
proc_num = (++proc_num) % (line_counter - 1);
}
}
printf("All programs ended.");//This never gets printed!
}
return 0;