1

I am having an issue with capturing the return status of the child process. below is a simplified version of my code.

use Modern::Perl;
use POSIX;
use AnyEvent;

my @jobs = (1, 7, 3, 9 , 4 , 2);
my %pid;
my %running;

my $t = AE::timer 0, 5, sub{
    while(scalar( keys %running < 3) && scalar (@jobs)){
        my $job = shift @jobs;
        $running{$job}=1;
        $pid{$job} = run($job);
    }
    for(keys %running){
        delete $running{$_} unless check($pid{$_},$_);
    }
    exit unless scalar keys %running;
};

AnyEvent->condvar->recv;

sub func_to_run{
    my $id = shift;
    close STDOUT;
    open STDOUT, ">>$id.log";
    exec '/bin/sleep', $id;
}

sub run{
    my $id = shift;
    print "starting job $id\n";
    my $pid = fork();
    return $pid if $pid;
    func_to_run($id);
}

sub check{
    my ($pid,$id) = @_;
    my $result = waitpid($pid, WNOHANG);
    {
        if ($result == $pid) {
            my $rc = $? >> 8;
            print "Job $id finished with code $rc\n";
            return 0;
        }
        elsif ($result == -1 and $! == ECHILD) {
            print "Job $id finished running, not sure if it was sucessfull\n";
            return 0;
        }
        elsif ($result == 0) {
            return 1;
        }
        redo;
    }
}

OUTPUT:

starting job 1
starting job 7
starting job 3
Job 1 finished running, not sure if it was sucessfull
Job 3 finished running, not sure if it was sucessfull
starting job 9
starting job 4
Job 7 finished running, not sure if it was sucessfull
starting job 2
Job 4 finished running, not sure if it was sucessfull
Job 9 finished running, not sure if it was sucessfull
Job 2 finished running, not sure if it was sucessfull

why is waitpid() returning -1 instead of a return status?

EDIT: I changed system + exit to exec. This was what I was originally doing. My goal is to be able to signal the child process, which I don't actually think can be done with system.

kill($pid,'HUP');

EDIT 2: There can be several child processes running at once, and this is being called from a AE::timer module. what I want to figure out here is why I am getting -1 from waitpid() which indicates that the child was reaped.

EDIT 3: I have changed the code to a full working example with the output I get

Smartelf
  • 849
  • 1
  • 10
  • 26
  • Do you have `$SIG{CHLD} = 'IGNORE';` set somewhere? – Slaven Rezic Sep 11 '13 at 15:49
  • No, I didn't set that signal handler. I have edited the question to be slightly more clear – Smartelf Sep 11 '13 at 17:08
  • Wait, what? You want the child process to signal itself? `kill HUP => $$` But ... why? – pilcrow Sep 11 '13 at 17:22
  • no, I want the parent process to be able to signal the child process. – Smartelf Sep 11 '13 at 17:25
  • Why do you use `WNOHANG`? Note that your code doesn't even declare that symbol (or `ECHILD`)! Always use `use strict; use warnings;` – ikegami Sep 11 '13 at 17:48
  • Wait, are you saying (in update 2) that it's different code than the code you posted that results in the behaviour you described? Please post code that actually exhibits the behaviour you describe. -1, posted code does not exhibit the stated behaviour. – ikegami Sep 11 '13 at 17:53
  • While the code is more complex, essentially the above is whats going wrong. I didn't add the perl modules being used, but it uses the POSIX module as well as Modern::Perl(which includes both strict and warnings) – Smartelf Sep 11 '13 at 17:55
  • Please, can you extend your sample code to be complete? If I try your code above and add the missing parts (use of POSIX, Errno, strict, print out the $code), then I *cannot* reproduce the problem (unless `$SIG{CHLD}` is set). – Slaven Rezic Sep 11 '13 at 19:21

1 Answers1

3

I checked what your code is actually doing with the strace command on linux. The following is what you see as one of the sleep commands completes:

$ strace -f perl test.pl
...
[pid  4891] nanosleep({1, 0}, NULL)     = 0
[pid  4891] close(1)                    = 0
[pid  4891] close(2)                    = 0
[pid  4891] exit_group(0)               = ?
[pid  4891] +++ exited with 0 +++
 2061530, 64, 4990) = -1 EINTR (Interrupted system call)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4891, si_status=0, si_utime=0, si_stime=0} ---
write(4, "\1\0\0\0\0\0\0\0", 8)         = 8
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
clock_gettime(CLOCK_MONOTONIC, {97657, 317300660}) = 0
clock_gettime(CLOCK_MONOTONIC, {97657, 317371410}) = 0
epoll_wait(3, {{EPOLLIN, {u32=4, u64=4294967300}}}, 64, 3987) = 1
clock_gettime(CLOCK_MONOTONIC, {97657, 317493076}) = 0
read(4, "\1\0\0\0\0\0\0\0", 8)          = 8
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 4891
wait4(-1, 0x7fff8f7bc42c, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
clock_gettime(CLOCK_MONOTONIC, {97657, 317738921}) = 0
epoll_wait(3, {}, 64, 3986)             = 0
clock_gettime(CLOCK_MONOTONIC, {97661, 304667812}) = 0
clock_gettime(CLOCK_MONOTONIC, {97661, 304719985}) = 0
epoll_wait(3, {}, 64, 1)                = 0
...

The lines starting [pid 4891] are from the sleep command, and the rest are from your script. You can see that the script is invoking the wait4() system call and returning the PID of the sleep process — presumably as part of the event loop that the script is using. This is why you’re getting -1 from your call to waitpid() — the child process has already been reaped.

By the way, the AnyEvent documentation has a section (CHILD PROCESS WATCHERS) on watching child processes and examining their return codes. From the documentation:

my $done = AnyEvent->condvar;

my $pid = fork or exit 5;

my $w = AnyEvent->child (
   pid => $pid,
   cb  => sub {
      my ($pid, $status) = @_;
       warn "pid $pid exited with status $status";
      $done->send;
   },
);

# do something else, then wait for process exit
$done->recv;

With regard to using system() or exec() to spawn the process, you are correct to use exec(). This is because system() creates a sub-process to execute its command in whereas exec() replaces the current process with the command. This means that the $pid from the system() would refer to the forked Perl script, and not to the command run by the Perl script.

Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
Jonathan Barber
  • 871
  • 1
  • 7
  • 10