3

My program stops running while the number of child processes is big. I do not know what the problem could be, but I guess the child processes are somehow blocked while running.

Here is the main workflow of the program:

void function(int process_num){

    int i;

    // initial variables for fork()
    int status = 0;
    pid_t child_pid[process_num], wpid;
    int *fds = malloc(sizeof(int) * process_num * 2);

    // initial pipes 
    for(i=0; i<process_num; i++){
        if(pipe(fds + i*2) <0)
            exit(0);
    }

    // start child processes to write
    for(i=0; i<process_num; i++){
        child_pid[i] =fork();

        if(child_pid[i] == 0){
            close(fds[i*2]);
            // do something ...
            // write(fds[i*2+1], something);
            close(fds[i*2+1]);
            exit(0);
        }else if(child_pid[i] == -1){
            printf("fork error\n");
            exit(0);
        }
    }

    // parent waits child processes and reads
    for(i=0; i<process_num; i++){

      pid_t cpid = waitpid(child_pid[i], &status, 0);
      if (WIFEXITED(status)){
        close(fds[i*2+1]);
        // do something ...
        // read(fds[i*2], something);
       close(fds[i*2]);
      }
    }
    free(fds);
    while((wpid = wait(&status)) > 0);
}

I checked the status of processes via htop, there were several (e.g. 8 while process_num was 110) child processes left with state S.

and now comes my question: if the number of child processes is greater than the number of processors, will the child processes be blocked while using pipeline for child processes and present process to communicate (parent process waits until all child processes executed)? Thanks a lot!

EDIT: I printed the Id of fds when using read() and write(), and I found that the read began at 4 and write at 5, I have no idea why was it the case, does somebody know that?

AKX
  • 152,115
  • 15
  • 115
  • 172
heisthere
  • 374
  • 1
  • 12
  • 1
    No, the number of processors does not limit this. You can have (e.g.) 100 processes on a single core machine--no problem. More likely, blockage is due to a bug in your code. Specifically, the _parent_ process should close the fds for middle parts of the pipe _before_ doing any waits. I ran your posted program and it completes in a fraction of a second, so how close is your posted code to your actual program? – Craig Estey Nov 28 '21 at 17:34
  • You definitely have a bug. When I set the number of processes to a small number (e.g.) 10 but set the buffer write length to 100,000 I get blocked. Where are the `read` calls? At each stage `i`, you must read from `fds[(i - 1) * 2]` and write to `fds[i * 2 + 1]`. The first stage is special (e.g.) read from some file. And, the last stage is special (e.g.) write to stdout. I'd use a `struct` to control each stage. For an example of a working pipe [within a custom shell], see my answer: https://stackoverflow.com/questions/52823093/fd-leak-custom-shell/52825582#52825582 – Craig Estey Nov 28 '21 at 17:57
  • @Craig Estey Hey, thanks a lot! I found that there is indeed a bug with read(). My read function throws an error for one of the fds, where the data has a length of 0. `fds[(i - 1) * 2]` is correct instead of `fds[i*2]` when reading? But what happens when `i =0` ? – heisthere Nov 28 '21 at 18:06
  • Like I said, first stage (i.e. `i == 0`) is special. There is _no_ [valid] `fds` entry for `i - 1`. And, likewise no valid output value for the last stage. What you're doing is the equiv of a shell pipe: `| cat | cat | cat |` instead of `cat < infile | cat | cat > outfile` – Craig Estey Nov 28 '21 at 18:10
  • @CraigEstey Oh sorry, I acciently skiped the part you mentioned about i == 0. But I read https://tldp.org/LDP/lpg/node11.html, and followed the example, that the i*2 for reading and i*2 + 1 for writing – heisthere Nov 28 '21 at 18:25
  • The best thing I can suggest is to look at the link I posted for an old answer of mine. The `pipefork` function would be of particular interest. It is well commented and should help. Where it does `execvp`, is where your `read/write` loop should go – Craig Estey Nov 28 '21 at 18:31
  • @heisthere Did you just edit your question to be entirely different, with the same code? – AKX Nov 30 '21 at 14:25
  • @AKX Yes, because the problem was not the one that I thought – heisthere Nov 30 '21 at 14:26
  • Then you ask another question. Changing this question to another invalidates all the discussion in the comments, don't you think? – AKX Nov 30 '21 at 14:26
  • But that's why I want to keep this post. Because comments above indicate part of the problem I am describing. Otherwise it would be just a duplicate post I think – heisthere Nov 30 '21 at 14:28
  • Either way, the fd numbers have no significance. 0, 1, 2 are stdio streams. 3 could be reserved for something, but again, it does not matter. – AKX Nov 30 '21 at 14:32
  • @heisthere I reverted the edit to the original form, but added your extra question at the end. The comments about numbers of processors etc. wouldn't make any sense with your question about fd numbers. – AKX Nov 30 '21 at 14:41

1 Answers1

1

The number of processors has no impact here. The operating system is alive, able to run any process that has something to do. This is a pure software problem, all processes are in sleep state (S) waiting for some event that never happens.

Eric Marchand
  • 619
  • 3
  • 10