1

In the shell I am developing, I execute a set of commands A | B | C by forking children to execute each child in the pipe. The 3 children all have the same PGID as that of the first child. That is, the 3 children with PID x, y, z have PGID = x. The execution of all the 3 commands run perfectly. In the SIGCHLD signal handler sigchld_handler() I wait count the number of children terminated and once it is 3, I get the PGID to get the job data to remove from the JobList. However, the function getpgid() returns -1 for all the 3 PIDs x, y, z. i.e. getpgid(x), getpgid(y), getpgid(z) all return -1 with errno 3 (ESRCH).

While setting the pgid to the children using setpgid() in the parent process, getpgid() worked perfectly fine and returned x. This problem occurs only in the signal handler. Can you please guide me to get the pgid of the pid in the signal handler?

Here is the signal handler code:

void sigchld_handler(int s) {

    \\declarations
    pid_t pid, pgid;
    .
    .
    .

    while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
        pgid = getpgid(pid);   // pgid = -1, but should return x.
        .
        .
        .
    }
}

while in main(), in the parent process, after I do:

.
.
setpgid(x, x);
setpgid(y, x);
setpgid(z, x);
.
.


getpgid(x) returns x
getpgid(y) returns x
getpgid(z) returns x

Any help is greatly appreciated.

Thanks.

immortal
  • 3,118
  • 20
  • 38
Shubs
  • 90
  • 1
  • 7
  • What's the value of `errno` in case of failure? `ESRCH`? – alk Nov 04 '17 at 09:34
  • 1
    SIGCHLD is something you get when a child process terminates. It sort of makes sense you wouldn't be able to request PGID for a dead process... Note that you only run it after you've already reaped the process with `waitpid`... – immortal Nov 04 '17 at 09:46
  • @ESRCH - The value of errno is 3. – Shubs Nov 04 '17 at 10:05
  • @immortal - I need to remove the job from the JobList after all the children terminate normally. How do I remove them from the JobList without knowing their pgid? Is there another method or something I am missing? Also, I am not waiting in the parent process. I sigsuspend() in the parent process. When the child terminates, sigchld_handler() automatically gets called where I call waitpid() and reap the children one by one. This is where I call getpgid() i.e. just after getting pid from waitpid(). – Shubs Nov 04 '17 at 10:06
  • "*just after getting pid from waitpid()*" *immortal* is right. You cannot ask a dead men about his family. That's probably also the reason why `getpgid()` is not listed as being required to be async-signal-safe. So you may not just even call it from a signal handler (for the list mentioned see somewhere in the [upper half of this page](http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html). – alk Nov 04 '17 at 10:12
  • I do not know the which process has terminated. I get to know only when `waitpid(-1, &status, WNOHANG | WUNTRACED)` returns with `pid` right? Else is there another way I can get to know which `pid` caused the execution of `sigchld_handler()` before I call `waitpid()`? – Shubs Nov 04 '17 at 10:17
  • 1
    @Shubs What JobList? Is this a data structure of your own making? Why not storing them by PID? Or maybe create a mapping from PID to PGID internally after each fork? – immortal Nov 04 '17 at 10:19
  • @immortal - Yes, JobList is a data structure I made to store the running, suspended and bg jobs in a linked list. I am not storing them with PID since the 3 children are related. They should belong to the same process group for my job handling purposes. So I give them the same `pgid` = `pid` of the first child. I can do a mapping of `pid` to `pgid` myself, that is a good idea :). – Shubs Nov 04 '17 at 10:23
  • @immortal - Yeah you were correct. I was able to get the `pgid` from `pid` in `sigchld_handler()` for a suspended process. I think that is the correct answer. Thank you! – Shubs Nov 04 '17 at 10:44

1 Answers1

2

SIGCHLD is a signal you get when a child process terminates. It sort of makes sense you wouldn't be able to request PGID for a dead process... Note that you only run it after you've already reaped the process with waitpid, so the system fails finding the requested PID to extract a PGID from it.

The error you get (3) is ESRCH:

#define ESRCH        3  /* No such process */

Which only strengthens this point - the PID is no longer valid. I recommend you create an internal mapping from PID to GID and lookup internally in your process.

immortal
  • 3,118
  • 20
  • 38