0

I am trying to write a program that takes a few processes as its arguments. Then the parent process executes each child process and prints out a few statistics with regard to the same.

Example: /generate ls -l //Would result in a program that prints out some statistics with regard to ls -l (Specifically its system time, user time and number of context switches).

Instead of using the getrusage() function, I would like to get the necessary information from the Proc file system. Now my understanding is that if I were to use a wait() function, it would end up removing the information from my proc file system. I have included my code below

#include <time.h>
#include <stdbool.h>
#include <assert.h>
#include <errno.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/resource.h>



void inputted_command(int a, char **b){
 for(int i=1;i<a;i++)
     printf("%s ",b[i]);


}

int main(int argc, char **argv){
    int status; 
  pid_t childpid;
  pid_t get_information;
if (argc < 2)
    {

        return 1;
    }

    bool handle_signals = (signal(SIGINT, SIG_IGN) != SIG_IGN); 
    clock_t t; 
    t= clock(); 
    pid_t pid = fork();

if(pid<0) 
{
printf("fork: error no = %s\n",strerror(errno));
 return 1;

}
else if(pid>0){
    signal(SIGINT,SIG_IGN); 

  sleep(60);

  /*
   get_information=fork();
   if(get_information==0){
     execlp(___);

   }else
  waitpid(pid, &status, 0); 



  */
  waitpid(childpid, &status, 0); 

    t= clock()-t; 
    double real_time_taken = ((double)t)/CLOCKS_PER_SEC;

  printf("The command "); 
  inputted_command(argc,argv);
  if(WIFSIGNALED(status)){

  printf("is interrupted by the signal number = %d (Insert Name Here) real: %.2f, user: , system: , context switch:  \n",WTERMSIG(status),real_time_taken);

}

else{

printf("terminated with return status code = %d real: %.2f, user: , system: , context switch:  \n",WEXITSTATUS(status), real_time_taken);

}

}

else if(pid==0){
       childpid=getpid();
        printf("Process with id: %d created for the command: ",(int)getpid());
        inputted_command(argc,argv);
        printf("\n");
        assert(pid == 0); 
        if (handle_signals) 
        signal(SIGINT, SIG_DFL);
        execvp(argv[1], &argv[1]); 
        printf(" experienced an error in starting the command: ");
        inputted_command(argc,argv);
        printf("\n");
        exit(-1);
      }

}

  • A portion of my code has been commented which I'm unsure how to go about doing.
  • My Idea here is to first let the Parent process go to sleep so the child process finishes terminating.
  • Then the parent process creates a new child process to access the PROC/Fie system and get the necessary data(Which has been commented).
  • Finally, I call the wait function again and terminate the initial child process

So my main question here is whether this would be an appropriate way to go about getting information for the child process and how do I go about getting the information(Mainly the System Time, User Time and the voluntary and involuntary context switches?

Jerrico Kyle
  • 154
  • 11
  • Please fix the formatting of your code. What kind of data do you need exactly? you're most likely gonna end up with a data-race that way you're doing it. `ptrace` might be a better option, depending on what information you need. – nullp0tr Oct 21 '18 at 18:11
  • 1
    The information in `/proc` vanishes shortly after the process terminates — but I haven't investigated whether it vanishes immediately, or only after it has been waited for, or what. Sleeping is probably not appropriate. Waiting is more likely to be appropriate. You may want to look into [`sigaction()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html) and `SA_SIGINFO` and the information you can get back via that when catching `SIGCHLD` — I'm not sure whether the context switch information is available. – Jonathan Leffler Oct 21 '18 at 18:14
  • `getusage` ? I think you meant `getrusage` ? – KamilCuk Oct 21 '18 at 18:17
  • @Jonathan Leffler My understanding is /proc//stat contains the system/user time and /proc//status contains the context switches – Jerrico Kyle Oct 21 '18 at 18:25
  • @KamilCuk Yes. Updated. – Jerrico Kyle Oct 21 '18 at 18:25
  • 1
    @JerricoKyle: That sounds plausible. The question is not so much "what contains the information" as "how long does the information survive after the process terminates"? If process 3291 has exited, what is left in `/proc/3291` and how long does it remain there? When does it get removed? Some simple searching would (probably) tell me — but I'm not trying to answer your question (these are comments, not an answer). I am encouraging you to do the relevant research. There's probably a simple rule; it's probably sensible; it probably means that what you want to do can be done. – Jonathan Leffler Oct 21 '18 at 18:29
  • @JonathanLeffler Just wrote a program to test this. The `/proc` directory stays until the `wait` is done (even if the wait is done 5 seconds later). But, it's in an indeterminate state. Prior to the child exiting, the files are owned by the invoking user, but afterwards they are owned by root. Post child exit `/proc//status` was readable and appeared valid – Craig Estey Oct 21 '18 at 18:30
  • @CraigEstey: Thanks. I'm (mildly) curious about the timing issues. If the wait-family call releases the information in `/proc` for the PID, then how does the process that launched the PID get to read the `status` pseudo-file after it knows the child terminated. Does it need to open the status file before the wait, and then read it once the wait returns? Or is it less devious than that? There should be some documentation on the `/proc` file system — [`proc(5)`](http://man7.org/linux/man-pages/man5/proc.5.html) at http://man7.org for example — which should cover these details. – Jonathan Leffler Oct 21 '18 at 18:37
  • 'Tis curious that you include `` twice, and yet don't include `` at all. You really don't need `` these days; POSIX has not required that since POSIX 2004 (and the SUS — Single Unix Specification — never did need it, which is why POSIX changed). – Jonathan Leffler Oct 21 '18 at 20:34

2 Answers2

1

Here's a test program that I cooked up that may shed some light (Caveat: it is somewhat crude):

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>

int mode = -1;
pid_t pid;
time_t cur;
FILE *xf;
char dirproc[100];
char dirfd[100];
char status_file[100];
char cmd[100];

void
xsystem(char *cmd)
{

    printf("CMD: %s\n",cmd);
    system(cmd);
}

int
check(int nmode)
{
    struct stat st;
    int errst;
    int errkill;
    char buf[500];

    errkill = kill(pid,0);
    errst = stat(dirproc,&st);

    if (nmode != mode) {
        printf("elap=%d errkill=%d errst=%d\n",cur,errkill,errst);

        sprintf(cmd,"ls -l /proc/%d",pid);
        xsystem(cmd);

        sprintf(cmd,"ls -l /proc/%d/fd",pid);
        xsystem(cmd);

        sprintf(cmd,"cat /proc/%d/status",pid);
        xsystem(cmd);

        printf("fgets\n");

        rewind(xf);
        while (1) {
            char *cp = fgets(buf,sizeof(buf),xf);
            if (cp == NULL)
                break;
            fputs(buf,stdout);
        }

        mode = nmode;
    }

    return errkill;
}

// main -- main program
int
main(int argc,char **argv)
{
    char *cp;

    --argc;
    ++argv;

    for (;  argc > 0;  --argc, ++argv) {
        cp = *argv;
        if (*cp != '-')
            break;

        switch (cp[1]) {
        default:
            break;
        }
    }

    setlinebuf(stdout);
    setlinebuf(stderr);

    pid = fork();

    if (pid == 0) {
        open("/dev/null",O_RDONLY);
        sleep(1);
        exit(0);
    }

    sprintf(dirproc,"/proc/%d",pid);
    sprintf(dirfd,"/proc/%d/fd",pid);
    sprintf(status_file,"/proc/%d/status",pid);
    xf = fopen(status_file,"r");

    time_t beg = time(NULL);
    cur = 0;

    while (1) {
        cur = time(NULL);

        cur -= beg;
        if (cur >= 4)
            break;

        check(1);
    }

    printf("\n");
    printf("postloop\n");
    check(2);

    waitpid(pid,NULL,0);
    printf("\n");
    printf("postwait\n");
    check(3);

    return 0;
}

Here is the program output:

elap=0 errkill=0 errst=0
CMD: ls -l /proc/94913
total 0
dr-xr-xr-x. 2 xxx xxx 0 Oct 21 15:50 attr
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 autogroup
-r--------. 1 xxx xxx 0 Oct 21 15:50 auxv
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 cgroup
--w-------. 1 xxx xxx 0 Oct 21 15:50 clear_refs
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 cmdline
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 comm
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 coredump_filter
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 cpuset
lrwxrwxrwx. 1 xxx xxx 0 Oct 21 15:50 cwd -> /tmp/cld
-r--------. 1 xxx xxx 0 Oct 21 15:50 environ
lrwxrwxrwx. 1 xxx xxx 0 Oct 21 15:50 exe -> /tmp/cld/pgm2
dr-x------. 2 xxx xxx 0 Oct 21 15:50 fd
dr-x------. 2 xxx xxx 0 Oct 21 15:50 fdinfo
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 gid_map
-r--------. 1 xxx xxx 0 Oct 21 15:50 io
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 latency
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 limits
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 loginuid
dr-x------. 2 xxx xxx 0 Oct 21 15:50 map_files
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 maps
-rw-------. 1 xxx xxx 0 Oct 21 15:50 mem
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 mountinfo
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 mounts
-r--------. 1 xxx xxx 0 Oct 21 15:50 mountstats
dr-xr-xr-x. 6 xxx xxx 0 Oct 21 15:50 net
dr-x--x--x. 2 xxx xxx 0 Oct 21 15:50 ns
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 numa_maps
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 oom_adj
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 oom_score
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 oom_score_adj
-r--------. 1 xxx xxx 0 Oct 21 15:50 pagemap
-r--------. 1 xxx xxx 0 Oct 21 15:50 personality
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 projid_map
lrwxrwxrwx. 1 xxx xxx 0 Oct 21 15:50 root -> /
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 sched
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 schedstat
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 sessionid
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 setgroups
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 smaps
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 smaps_rollup
-r--------. 1 xxx xxx 0 Oct 21 15:50 stack
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 stat
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 statm
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 status
-r--------. 1 xxx xxx 0 Oct 21 15:50 syscall
dr-xr-xr-x. 3 xxx xxx 0 Oct 21 15:50 task
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 timers
-rw-rw-rw-. 1 xxx xxx 0 Oct 21 15:50 timerslack_ns
-rw-r--r--. 1 xxx xxx 0 Oct 21 15:50 uid_map
-r--r--r--. 1 xxx xxx 0 Oct 21 15:50 wchan
CMD: ls -l /proc/94913/fd
total 0
lrwx------. 1 xxx xxx 64 Oct 21 15:50 0 -> /dev/pts/0
l-wx------. 1 xxx xxx 64 Oct 21 15:50 1 -> /tmp/out2
l-wx------. 1 xxx xxx 64 Oct 21 15:50 2 -> /tmp/out2
lr-x------. 1 xxx xxx 64 Oct 21 15:50 3 -> /dev/null
CMD: cat /proc/94913/status
Name:   pgm2
Umask:  0022
State:  S (sleeping)
Tgid:   94913
Ngid:   0
Pid:    94913
PPid:   94912
TracerPid:  0
Uid:    500 500 500 500
Gid:    500 500 500 500
FDSize: 64
Groups: 500
NStgid: 94913
NSpid:  94913
NSpgid: 94912
NSsid:  3771
VmPeak:     4136 kB
VmSize:     4136 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:        80 kB
VmRSS:        80 kB
RssAnon:          80 kB
RssFile:           0 kB
RssShmem:          0 kB
VmData:       44 kB
VmStk:       132 kB
VmExe:         8 kB
VmLib:      1872 kB
VmPTE:        52 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
Threads:    1
SigQ:   0/47895
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Speculation_Store_Bypass:   thread vulnerable
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    1
nonvoluntary_ctxt_switches: 0
fgets
Name:   pgm2
Umask:  0022
State:  S (sleeping)
Tgid:   94913
Ngid:   0
Pid:    94913
PPid:   94912
TracerPid:  0
Uid:    500 500 500 500
Gid:    500 500 500 500
FDSize: 64
Groups: 500
NStgid: 94913
NSpid:  94913
NSpgid: 94912
NSsid:  3771
VmPeak:     4136 kB
VmSize:     4136 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:        80 kB
VmRSS:        80 kB
RssAnon:          80 kB
RssFile:           0 kB
RssShmem:          0 kB
VmData:       44 kB
VmStk:       132 kB
VmExe:         8 kB
VmLib:      1872 kB
VmPTE:        52 kB
VmSwap:        0 kB
HugetlbPages:          0 kB
CoreDumping:    0
Threads:    1
SigQ:   0/47895
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Speculation_Store_Bypass:   thread vulnerable
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    1
nonvoluntary_ctxt_switches: 0

postloop
elap=4 errkill=0 errst=0
CMD: ls -l /proc/94913
ls: cannot read symbolic link '/proc/94913/cwd': No such file or directory
ls: cannot read symbolic link '/proc/94913/root': No such file or directory
ls: cannot read symbolic link '/proc/94913/exe': No such file or directory
total 0
dr-xr-xr-x. 2 xxx  xxx  0 Oct 21 15:50 attr
-rw-r--r--. 1 root root 0 Oct 21 15:50 autogroup
-r--------. 1 root root 0 Oct 21 15:50 auxv
-r--r--r--. 1 root root 0 Oct 21 15:50 cgroup
--w-------. 1 root root 0 Oct 21 15:50 clear_refs
-r--r--r--. 1 root root 0 Oct 21 15:50 cmdline
-rw-r--r--. 1 root root 0 Oct 21 15:50 comm
-rw-r--r--. 1 root root 0 Oct 21 15:50 coredump_filter
-r--r--r--. 1 root root 0 Oct 21 15:50 cpuset
lrwxrwxrwx. 1 root root 0 Oct 21 15:50 cwd
-r--------. 1 root root 0 Oct 21 15:50 environ
lrwxrwxrwx. 1 root root 0 Oct 21 15:50 exe
dr-x------. 2 root root 0 Oct 21 15:50 fd
dr-x------. 2 root root 0 Oct 21 15:50 fdinfo
-rw-r--r--. 1 root root 0 Oct 21 15:50 gid_map
-r--------. 1 root root 0 Oct 21 15:50 io
-r--r--r--. 1 root root 0 Oct 21 15:50 latency
-r--r--r--. 1 root root 0 Oct 21 15:50 limits
-rw-r--r--. 1 root root 0 Oct 21 15:50 loginuid
dr-x------. 2 root root 0 Oct 21 15:50 map_files
-r--r--r--. 1 root root 0 Oct 21 15:50 maps
-rw-------. 1 root root 0 Oct 21 15:50 mem
-r--r--r--. 1 root root 0 Oct 21 15:50 mountinfo
-r--r--r--. 1 root root 0 Oct 21 15:50 mounts
-r--------. 1 root root 0 Oct 21 15:50 mountstats
dr-xr-xr-x. 2 xxx  xxx  0 Oct 21 15:50 net
dr-x--x--x. 2 root root 0 Oct 21 15:50 ns
-r--r--r--. 1 root root 0 Oct 21 15:50 numa_maps
-rw-r--r--. 1 root root 0 Oct 21 15:50 oom_adj
-r--r--r--. 1 root root 0 Oct 21 15:50 oom_score
-rw-r--r--. 1 root root 0 Oct 21 15:50 oom_score_adj
-r--------. 1 root root 0 Oct 21 15:50 pagemap
-r--------. 1 root root 0 Oct 21 15:50 personality
-rw-r--r--. 1 root root 0 Oct 21 15:50 projid_map
lrwxrwxrwx. 1 root root 0 Oct 21 15:50 root
-rw-r--r--. 1 root root 0 Oct 21 15:50 sched
-r--r--r--. 1 root root 0 Oct 21 15:50 schedstat
-r--r--r--. 1 root root 0 Oct 21 15:50 sessionid
-rw-r--r--. 1 root root 0 Oct 21 15:50 setgroups
-r--r--r--. 1 root root 0 Oct 21 15:50 smaps
-r--r--r--. 1 root root 0 Oct 21 15:50 smaps_rollup
-r--------. 1 root root 0 Oct 21 15:50 stack
-r--r--r--. 1 root root 0 Oct 21 15:50 stat
-r--r--r--. 1 root root 0 Oct 21 15:50 statm
-r--r--r--. 1 root root 0 Oct 21 15:50 status
-r--------. 1 root root 0 Oct 21 15:50 syscall
dr-xr-xr-x. 3 xxx  xxx  0 Oct 21 15:50 task
-r--r--r--. 1 root root 0 Oct 21 15:50 timers
-rw-rw-rw-. 1 root root 0 Oct 21 15:50 timerslack_ns
-rw-r--r--. 1 root root 0 Oct 21 15:50 uid_map
-r--r--r--. 1 root root 0 Oct 21 15:50 wchan
CMD: ls -l /proc/94913/fd
ls: cannot open directory '/proc/94913/fd': Permission denied
CMD: cat /proc/94913/status
Name:   pgm2
State:  Z (zombie)
Tgid:   94913
Ngid:   0
Pid:    94913
PPid:   94912
TracerPid:  0
Uid:    500 500 500 500
Gid:    500 500 500 500
FDSize: 0
Groups: 500
NStgid: 94913
NSpid:  94913
NSpgid: 94912
NSsid:  3771
Threads:    1
SigQ:   0/47895
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Speculation_Store_Bypass:   thread vulnerable
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    2
nonvoluntary_ctxt_switches: 0
fgets
Name:   pgm2
State:  Z (zombie)
Tgid:   94913
Ngid:   0
Pid:    94913
PPid:   94912
TracerPid:  0
Uid:    500 500 500 500
Gid:    500 500 500 500
FDSize: 0
Groups: 500
NStgid: 94913
NSpid:  94913
NSpgid: 94912
NSsid:  3771
Threads:    1
SigQ:   0/47895
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp:    0
Speculation_Store_Bypass:   thread vulnerable
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    2
nonvoluntary_ctxt_switches: 0

postwait
elap=4 errkill=-1 errst=-1
CMD: ls -l /proc/94913
ls: cannot access '/proc/94913': No such file or directory
CMD: ls -l /proc/94913/fd
ls: cannot access '/proc/94913/fd': No such file or directory
CMD: cat /proc/94913/status
cat: /proc/94913/status: No such file or directory
fgets

Thanks. I'm (mildly) curious about the timing issues. If the wait-family call releases the information in /proc for the PID, then how does the process that launched the PID get to read the status pseudo-file after it knows the child terminated.

Without using wait, the parent can't know definitively/easily/cleanly because the alternate way to check for a live process (e.g. kill(pid,0)) still returns 0. This was [somewhat] surprising to me.

Based on the test program output, one way that might work is to do a readlink on /proc/pid/cwd and check for error (i.e. error means process exited and is in zombie state).

Another way is to read /proc/pid/status and look for: State: Z (zombie)

Does it need to open the status file before the wait, and then read it once the wait returns?

After the wait is done, even a pre-opened stream on /proc/pid/status returns EOF. So, no joy.

Or is it less devious than that? There should be some documentation on the /proc file system — proc(5) at man7.org for example — which should cover these details.

The man page does some other files that change when the process becomes zombie (e.g. /proc/pid/cmdline)

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • Points for effort. I suspect that `sigaction()` plus `SIGCHLD` plus `SA_SIGINFO` allows you to know that a child died, and which child it was, and therefore to interrogate `/proc/PID` before calling `wait()` to make the sleeping dead (zombie) into the truly dead process (no entries left in `/proc` for the PID). It requires a bit of coding, though. – Jonathan Leffler Oct 21 '18 at 20:13
  • @JonathanLeffler Thanks. Unlike you, I need all the points I can get :-) I modified the program and `SIGCHLD` _does_ clue the parent process without needing to `wait`. I'm pre-coffee here today, so that's my only excuse. `siginfo_t` doesn't seem to have quite as much info as `/proc` [nor does `ucontext_t`]. So, doing `sigaction`, followed by a combination of all three may produce the most info. – Craig Estey Oct 21 '18 at 20:56
  • Thanks to your prompting, I've done some extra work to see how the SA_SIGINFO stuff works — and added my [answer](https://stackoverflow.com/a/52920568/15168). It's interesting chasing through the POSIX specification; it's something I've been meaning to do for a number of months, now, and I finally got spurred into action. – Jonathan Leffler Oct 21 '18 at 22:51
1

With some impetus from Craig Estey's answer, following on from my comment, and using information from POSIX for sigaction(), which points to Signal Actions and <signal.h>, I came up with the following code which uses SA_SIGINFO handling for the SIGCHLD signal, which allows the program to glean information from the /proc file system for the child process after it has terminated but before it has been waited for.

siginfo47.c

#define _XOPEN_SOURCE 700

#include "stderr.h"
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <inttypes.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h>

static int       got_signal = 0;
static siginfo_t child_info = { 0 };

static void sigchld(int signum, siginfo_t *info, void *ctxt)
{
    assert(info != 0);
    assert(ctxt != 0);
    assert(signum == info->si_signo);
    got_signal = signum;
    child_info = *info;
}

struct si_code_names
{
    int     si_code;
    char    si_code_name[16];
    char    si_code_meaning[64];
};

static struct si_code_names si_codes[] =
{
    [CLD_EXITED]    = { CLD_EXITED,    "CLD_EXITED",    "Child has exited." },
    [CLD_KILLED]    = { CLD_KILLED,    "CLD_KILLED",    "Child has terminated abnormally and did not create a core file." },
    [CLD_DUMPED]    = { CLD_DUMPED,    "CLD_DUMPED",    "Child has terminated abnormally and created a core file." },
    [CLD_TRAPPED]   = { CLD_TRAPPED,   "CLD_TRAPPED",   "Traced child has trapped." },
    [CLD_STOPPED]   = { CLD_STOPPED,   "CLD_STOPPED",   "Child has stopped." },
    [CLD_CONTINUED] = { CLD_CONTINUED, "CLD_CONTINUED", "Stopped child has continued." },
};

static void cat_proc_file(int pid, const char *base)
{
    char buffer[1024];
    int rc;
    rc = snprintf(buffer, sizeof(buffer), "/proc/%d/%s", pid, base);
    if (rc < 0 || rc >= (int)sizeof(buffer))
        err_error("snprintf() failed - can't happen!?!\n");
    int fd = open(buffer, O_RDONLY);
    if (fd < 0)
        err_syserr("failed to open file '%s' for reading: ", buffer);
    printf("Contents of %s:\n", buffer);
    int nbytes;
    while ((nbytes = read(fd, buffer, sizeof(buffer))) > 0)
        printf("%*.*s", nbytes, nbytes, buffer);
    putchar('\n');
    fflush(stdout);
    close(fd);
}

int main(int argc, char **argv)
{
    char *cmdv[] = { "ls", "-l", 0 };
    err_setarg0(argv[0]);
    if (argc <= 1)
    {
        argc = 2;
        argv = cmdv;
    }
    else
    {
        argc--;
        argv++;
    }

    pid_t pid = fork();

    if (pid < 0)
        err_syserr("failed to fork: ");
    else if (pid > 0)
    {
        struct sigaction sa = { 0 };
        sa.sa_sigaction = sigchld;
        sa.sa_flags = SA_SIGINFO;
        if (sigaction(SIGCHLD, &sa, 0) != 0)
            err_syserr("failed to set signal handling: ");

        printf("Parent PID %d: pausing while PID %d runs\n", (int)getpid(), (int)pid);
        fflush(stdout);
        pause();
        printf("Parent PID %d: unpaused\n", (int)getpid());

        printf("Stashed information:\n");
        printf("  Signal:       %d\n", got_signal);
        printf("  si_signo:     %d\n", child_info.si_signo);
        printf("  si_code:      %d\n", child_info.si_code);
        if (child_info.si_signo == SIGCHLD)
        {
            struct si_code_names *code = &si_codes[child_info.si_code];
            printf("                [%s] %s\n", code->si_code_name,
                   code->si_code_meaning);
        }
        printf("  si_pid:       %d\n", (int)child_info.si_pid);
        printf("  si_uid:       %d\n", (int)child_info.si_uid);
        printf("  si_addr:      0x%12" PRIXPTR "\n", (uintptr_t)child_info.si_addr);
        printf("  si_status:    %d\n", child_info.si_code);
        printf("  si_value.int: %d\n", child_info.si_value.sival_int);

        cat_proc_file(pid, "stat");
        cat_proc_file(pid, "status");

        int status;
        int corpse;
        if ((corpse = waitpid(pid, &status, 0)) != pid)
            err_syserr("failed to wait for child %d", pid);

        if (WIFSIGNALED(status))
            printf("PID %d died from signal number = %d (0x%.4X)\n",
                   corpse, WTERMSIG(status), status);
        else if (WIFEXITED(status))
            printf("PID %d exited normally with status = %d (0x%.4X)\n",
                   corpse, WEXITSTATUS(status), status);
        else
            printf("PID %d was neither signalled nor exited normally (0x%.4X)\n",
                   corpse, status);
    }
    else if (pid == 0)
    {
        printf("PID: %d:", (int)getpid());
        for (int i = 0; argv[i] != 0; i++)
            printf(" %s", argv[i]);
        putchar('\n');
        fflush(stdout);
        execvp(argv[0], &argv[0]);
        err_syserr("failed to execute %s: ", argv[0]);
        /*NOTREACHED*/
    }
}

Some of this code is available in my SOQ (Stack Overflow Questions) repository on GitHub. Specifically, the files stderr.c and stderr.h can be found in the src/libsoq sub-directory. They greatly simplify the error reporting.

Example runs include:

$ siginfo47
Parent PID 15016: pausing while PID 15017 runs
PID: 15017: ls -l
total 400
drwxr-xr-x   2 jleffler pd  4096 Oct 21 15:16 bin
drwxr-xr-x   5 jleffler pd   256 Oct 21 15:15 doc
drwxr-xr-x   2 jleffler pd  4096 Oct 21 15:15 etc
drwxr-xr-x   2 jleffler pd  4096 Oct 21 15:16 inc
drwxr-xr-x   2 jleffler pd   256 Oct 21 15:16 lib
-rw-r--r--   1 jleffler pd 22072 Oct 21 15:15 LICENSE.md
-rw-r--r--   1 jleffler pd   390 Oct 21 15:15 makefile
drwxr-xr-x   2 jleffler pd   256 Oct 21 15:15 packages
-rw-r--r--   1 jleffler pd  2694 Oct 21 15:15 README.md
-rwxr-xr-x   1 jleffler pd 64968 Oct 21 15:17 siginfo41
-rw-r--r--   1 jleffler pd  5990 Oct 21 15:17 siginfo41.c
-rwxr-xr-x   1 jleffler pd 66104 Oct 21 15:34 siginfo47
-rw-r--r--   1 jleffler pd  7417 Oct 21 15:33 siginfo47.c
drwxr-xr-x 230 jleffler pd  8192 Oct 21 15:15 src
Parent PID 15016: unpaused
Stashed information:
  Signal:       17
  si_signo:     17
  si_code:      1
                [CLD_EXITED] Child has exited.
  si_pid:       15017
  si_uid:       9508
  si_addr:      0x252400003AA9
  si_status:    1
  si_value.int: 0
Contents of /proc/15017/stat:
15017 (ls) Z 15016 15016 13211 34827 15016 4227084 452 0 0 0 0 0 0 0 20 0 1 0 511347844 0 0 18446744073709551615 0 0 0 0 0 0 0 0 0 18446744073709551615 0 0 17 6 0 0 0 0 0 0 0 0 0 0 0 0 0

Contents of /proc/15017/status:
Name:   ls
State:  Z (zombie)
Tgid:   15017
Ngid:   0
Pid:    15017
PPid:   15016
TracerPid:  0
Uid:    9508    9508    9508    9508
Gid:    1240    1240    1240    1240
FDSize: 0
Groups: 297 1240 1360 8714 
Threads:    1
SigQ:   0/71487
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
Seccomp:    0
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    3
nonvoluntary_ctxt_switches: 1

PID 15017 exited normally with status = 0 (0x0000)
$ siginfo47 exitcode 23
Parent PID 15032: pausing while PID 15033 runs
PID: 15033: exitcode 23
Parent PID 15032: unpaused
Stashed information:
  Signal:       17
  si_signo:     17
  si_code:      1
                [CLD_EXITED] Child has exited.
  si_pid:       15033
  si_uid:       9508
  si_addr:      0x252400003AB9
  si_status:    1
  si_value.int: 23
Contents of /proc/15033/stat:
15033 (exitcode) Z 15032 15032 13211 34827 15032 4227084 179 0 0 0 0 0 0 0 20 0 1 0 511349111 0 0 18446744073709551615 0 0 0 0 0 0 0 0 0 18446744073709551615 0 0 17 5 0 0 0 0 0 0 0 0 0 0 0 0 0

Contents of /proc/15033/status:
Name:   exitcode
State:  Z (zombie)
Tgid:   15033
Ngid:   0
Pid:    15033
PPid:   15032
TracerPid:  0
Uid:    9508    9508    9508    9508
Gid:    1240    1240    1240    1240
FDSize: 0
Groups: 297 1240 1360 8714 
Threads:    1
SigQ:   0/71487
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
Seccomp:    0
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    3
nonvoluntary_ctxt_switches: 1

PID 15033 exited normally with status = 23 (0x1700)
$ siginfo47 exitcode -s 13
PID: 15057: exitcode -s 13
Parent PID 15056: pausing while PID 15057 runs
Parent PID 15056: unpaused
Stashed information:
  Signal:       17
  si_signo:     17
  si_code:      2
                [CLD_KILLED] Child has terminated abnormally and did not create a core file.
  si_pid:       15057
  si_uid:       9508
  si_addr:      0x252400003AD1
  si_status:    2
  si_value.int: 13
Contents of /proc/15057/stat:
15057 (exitcode) Z 15056 15056 13211 34827 15056 4228108 177 0 0 0 0 0 0 0 20 0 1 0 511350462 0 0 18446744073709551615 0 0 0 0 0 4096 0 0 0 18446744073709551615 0 0 17 5 0 0 0 0 0 0 0 0 0 0 0 0 0

Contents of /proc/15057/status:
Name:   exitcode
State:  Z (zombie)
Tgid:   15057
Ngid:   0
Pid:    15057
PPid:   15056
TracerPid:  0
Uid:    9508    9508    9508    9508
Gid:    1240    1240    1240    1240
FDSize: 0
Groups: 297 1240 1360 8714 
Threads:    1
SigQ:   1/71487
SigPnd: 0000000000001000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
Seccomp:    0
Cpus_allowed:   ff
Cpus_allowed_list:  0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:  0
voluntary_ctxt_switches:    2
nonvoluntary_ctxt_switches: 1

PID 15057 died from signal number = 13 (0x000D)
$ exitcode -h
Usage: exitcode [-hV] [-s signal] [exit-status]
  -h         Print this help message and exit
  -s signal  Kill self with signal number
  -V         Print version information and exit

$

As noted by the help message, the exitcode program dies with an exit status, either normally (exitcode 23) or as a result of a signal (exitcode -s 13).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • See also [How to get the return value of child process to its parent which was created using exec?](https://stackoverflow.com/a/52733125/15168) for another (brief) discussion of this technique. – Jonathan Leffler Oct 26 '18 at 21:49
  • thank you for this question and detailed answer, Jonathan. I found the example code in this answer works well, but also found that it doesn't work for /proc//maps and /proc//smaps. It looks like those two files are already empty when SIGCHLD is emitted. I created a new question about this. It would be appreciated if you know workaround of this issue and share it. thanks https://stackoverflow.com/questions/66568052/how-to-read-proc-pid-maps-of-a-child-process-just-before-the-child-process-term – joybro Mar 10 '21 at 15:41