Problem with piped filehandle in perl

Question

I am trying to run bp_genbank2gff3.pl (bioperl package) from another perl script that gets a genbank as its argument.

This does not work (no output files are generated):

   my $command = "bp_genbank2gff3.pl -y -o /tmp $ARGV[0]";

   open( my $command_out, "-|", $command );
   close $command_out;

but this does

   open( my $command_out, "-|", $command );
   sleep 3; # why do I need to sleep?
   close $command_out;

Why?

I thought that close is supposed to block until the command is done:

Closing any piped filehandle causes the parent process to wait for the child to finish... (see http://perldoc.perl.org/functions/open.html).

Edit

I added this as last line:

say "ret=$ret, \$?=$?, \$!=$!";

and in both cases the printout is:

ret=, $?=13, $!=

(which means close failed in both cases, right?)

What is the return value of `close()` ? What is `$?` and perhaps `$!` ? Does `bp_genbank2gff3.pl`, or the shell expansion of `$ARGV[0]` fork and exit? What does `strace` or `truss` say is going on? Are you sure the output files in the "working" case aren't left over from an unrelated, successful job? Can you reproduce the problematic behavior with some commonly available shell utility, rather than `bp_...3.pl` ? — pilcrow, Aug 13 '10 at 15:08
@pilcrow see edit. `strace` returns a very long list, what should I look for? I'm sure the output is not a leftover when `sleep` is on (I delete the entire content of the dir before I run). I didn't understand your question re the fork. BTW: http://github.com/bioperl/bioperl-live/blob/master/scripts/Bio-DB-GFF/genbank2gff3.PLS — David B, Aug 13 '10 at 15:41
`strace -fe trace=process my_perl_script` should get you started. However, @mobrule figured it out from `$?`. — pilcrow, Aug 13 '10 at 16:15

mob · Accepted Answer · 2010-08-13T16:28:36.247

$? = 13 means your child process was terminated by a SIGPIPE signal. Your external program (bp_genbank2gff3.pl) tried to write some output to a pipe to your perl program. But the perl program closed its end of the pipe so your OS sent a SIGPIPE to the external program.

By sleeping for 3 seconds, you are letting your program run for 3 seconds before the OS kills it, so this will let your program get something done. Note that pipes have a limited capacity, though, so if your parent perl script is not reading from the pipe and if the external program is writing a lot to standard output, the external program's write operations will eventually block and you may not really get 3 seconds of effort from your external program.

The workaround is to read the output from the external program, even if you are just going to throw it away.

open( my $command_out, "-|", $command );
my @ignore_me = <$command_out>;
close $command_out;

Update: If you really don't care about the command's output, you can avoid SIGPIPE issues by redirecting the output to /dev/null:

open my $command_out, "-|", "$command > /dev/null";
close $command_out;     # succeeds, no SIGPIPE

Of course if you are going to go to that much trouble to ignore the output, you might as well just use system.

Additional info: As the OP says, closing a piped filehandle causes the parent to wait for the child to finish (by using waitpid or something similar). But before it starts waiting, it closes its end of the pipe. In this case, that end is the read end of the pipe that the child process is writing its standard output to. The next time the child tries to write something to standard output, the OS detects that the read end of that pipe is closed and sends a SIGPIPE to the child process, killing it and quickly letting the close statement in the parent finish.

I don't get something. My external program indeed tried to write some output (something like "working on..." at the beginning then "done" at the end). But why do you mean by "the perl program closed its end of the pipe"? In other words, why exactly `my @ignore_me = <$command_out>;` do that makes such a difference? — David B, Aug 13 '10 at 15:55
@David B - You get a `SIGPIPE` when one process (your child, in this example) writes to a pipe that the reader (your parent) has already closed. My writing `<$command_out>` in the parent, you are keeping the reading end of the pipe open until the writing end of the pipe finishes. — mob, Aug 13 '10 at 16:22
It reads what the child is trying to write. That makes a difference. You are turning the water to the garden hose and plugging the tip of the hose by not reading the output from the program to which you opened a pipe. If you don't want the program's output, use `system`. — Sinan Ünür, Aug 13 '10 at 16:26
rule, @Sinan and @pilcrow - thank you all. This was interesting. — David B, Aug 13 '10 at 16:36

score 0 · Answer 2 · answered Aug 13 '10 at 13:57

0

I'm not sure what you're trying to do but system is probably better in this case...

answered Aug 13 '10 at 13:57

sebthebert

12,196
2
26
37

I use `open` since I would like to read the command stdout on the fly, which can't be done using `system` as far as I know (`system` waits for the command to finish then return all the output at once). The example above is just a simplified version which does not include this part, but it is immaterial to the problem. – David B Aug 13 '10 at 14:05
Ok, so you need a "while(<$command_out>) { do stuff }" between your open and close – sebthebert Aug 13 '10 at 14:31
This is a generic method I'm using. So sometimes I indeed want to to something with what the command sends to stdout, and then I really use the `while` as you suggested. But if I don't why should the command not execute correctly? The command we're discusing here, for example, gets a filename of some format then splits its data into two files of different formats. It generates files and also print some logging messages to stdout. In this case, I don't care about the stdout so I don't use a while. Why should this matter? – David B Aug 13 '10 at 14:34

Problem with piped filehandle in perl

2 Answers2