I am porting a debugger, 'pi' ('process inspector') to Linux and am working on the code for fork/exec of a child to inspect it. I am following standard procedure (I believe) but the wait is hanging. 'hang' is the procedure which does the work, the 'cmd' argument being the name of the binary (a.out) to trace:
int Hostfunc::hang(char *cmd){
char *argv[10], *cp;
int i;
Localproc *p;
struct exec exec;
struct rlimit rlim;
i = strlen(cmd);
if (++i > sizeof(procbuffer)) {
i = sizeof(procbuffer) - 1;
procbuffer[i] = 0;
}
bcopy(cmd, procbuffer, i);
argv[0] = cp = procbuffer;
for(i = 1;;) {
while(*cp && *cp != ' ')
cp++;
if (!*cp) {
argv[i] = 0;
break;
} else {
*cp++ = 0;
while (*cp == ' ')
cp++;
if (*cp)
argv[i++] = cp;
}
}
hangpid = fork();
if (!hangpid){
int fd, nfiles = 20;
if(getrlimit(RLIMIT_NOFILE, &rlim))
nfiles = rlim.rlim_cur;
for( fd = 0; fd < nfiles; ++fd )
close(fd);
open("/dev/null", 2);
dup2(0, 1);
dup2(0, 2);
setpgid(0, 0);
ptrace(PTRACE_TRACEME, 0, 0, 0);
execvp(argv[0], argv);
exit(0);
}
if (hangpid < 0)
return 0;
p = new Localproc;
if (!p) {
kill(9, hangpid);
return 0;
}
p->sigmsk = sigmaskinit();
p->pid = hangpid;
if (!procwait(p, 0)) {
delete p;
return 0;
}
if (p->state.state == UNIX_BREAKED)
p->state.state = UNIX_HALTED;
p->opencnt = 0;
p->next = phead;
phead = p;
return hangpid;
}
I put the 'abort()' in to catch a non-zero return from ptrace, but that is not happening. The call to 'raise' seems to be a common practice but a cursory look at gdb's code reveals it is not used there. In any case it makes no difference to the outcome. `procwait' is as follows:
int Hostfunc::procwait(Localproc *p, int flag){
int tstat;
int cursig;
again:
if (p->pid != waitpid(p->pid, &tstat, (flag&WAIT_POLL)? WNOHANG: 0))
return 0;
if (flag & WAIT_DISCARD)
return 1;
if (WIFSTOPPED(tstat)) {
cursig = WSTOPSIG(tstat);
if (cursig == SIGSTOP)
p->state.state = UNIX_HALTED;
else if (cursig == SIGTRAP)
p->state.state = UNIX_BREAKED;
else {
if (p->state.state == UNIX_ACTIVE &&
!(p->sigmsk&bit(cursig))) {
ptrace(PTRACE_CONT, p->pid, 1, cursig, 0);
goto again;
}
else {
p->state.state = UNIX_PENDING;
p->state.code = cursig;
}
}
} else {
p->state.state = UNIX_ERRORED;
p->state.code = WEXITSTATUS(tstat) & 0xFFFF;
}
return 1;
}
The 'waitpid' in 'procwait' just hangs. If I run the program with the above code, and run a 'ps', I can see that 'pi' has forked but hasn't yet called exec, because the command line is still 'pi', and not the name of the binary I am forking. I discovered that if I remove the 'raise', 'pi' still hangs but 'ps' now shows that the forked program has the name of the binary being examined, which suggests it has performed the exec.
So, as far as I can see, I am following documented procedures to take control of a forked process but it isn't working.
Noel Hunt