Avoid buffering when parsing stdout with Perl

Question

I want to parse the output of an external program (some shell command) line by line using Perl. The command runs continuously, so I put it into a thread and use shared variables to communicate with my main routine.

Up to now my code looks similar to

#!/usr/bin/perl

use warnings;
use strict;
use threads;
use threads::shared;

my $var :shared; $var="";

threads->create(
    sub {
        # command writes to stdout each ~100ms
        my $cmd = "<long running command> |";
        open(README, $cmd) or die "Can't run program: $!\n";
        while(<README>) {
            my $line = $_;
            # extract some information from line
            $var = <some value>;
            print "Debug\n";
        }
        close(README);
    }
);

while(1) {
    # evaluate variable each ~second
    print "$var\n";
    sleep 1;
}

For some commands this works perfectly fine and the lines are processed just as they come in. Output would be similar to:

...
Debug
Debug
...
<value 1>
...
Debug
Debug
...
<value 2>
...

However, for other commands, this behaves strange and the lines are being processed block wise. So $var doesn't get updated and Debug is not printed either for some time. Then, the suddenly the output is (similar to):

...
<value 1>
<value 1>
<value 1>
...
Debug
Debug
Debug
...
<value 20>

and $var is set to the last/current value. Then this repeats. The parsing is always delayed and done in blocks while $var is not updated in between.

First of all: Is there any better/propper way to parse the output of an external program (line by line!) besides using the pipe?

If not, how can I avoid this behaviour?

I've read, that using autoflush(1); or $|=1; might be a solution but only for the "currently selected output channel". How would I use that in my context?

Thank you in advance.

It's "long running command"'s output that isn't being flushed. There's a utility called `unbuffer` which fool programs that use the convention of line buffering when connected to terminal. — ikegami, Sep 23 '14 at 15:29
Is there a way to control the command's flushing behaviour? But the command prints every 100ms to ´stdout´. Where is the difference between shell stdout and piping it to perl? — raidlman, Sep 23 '14 at 15:34
I'd look at running the long running command from the command line and piping to `perl -n script`. — marneborn, Sep 23 '14 at 15:35
Re "Is there a way to control the command's flushing behaviour?", You mean other than the only I provided? I don't know, what command is it? — ikegami, Sep 23 '14 at 15:39
Re "Where is the difference between shell stdout and piping it to perl?" You redirected the output away from a terminal. Like Perl, most programs line-buffer output to STDOUT if it's connected to a terminal, or block-buffer it outherwise. — ikegami, Sep 23 '14 at 15:40
@marneborn But if I have several threads which are parsing the ouput of various commands? I think this is not possible this way. — raidlman, Sep 23 '14 at 15:40
@marneborn, Not gonna help. The pipping is what is "breaking" things. — ikegami, Sep 23 '14 at 15:41
@ikegami So changing `$cmd = " |";` to `$cmd = "unbuffer |";` should work? — raidlman, Sep 23 '14 at 15:46
Yup. You could also create pseudo-ttys yourself. IPC::Run can do that easily — ikegami, Sep 23 '14 at 21:40
@ikegami unbuffer seems to be working, but I'd prefer a Perl-ish solution if there is one. See also comment to _Calle Dybedahl's_ answer. — raidlman, Sep 24 '14 at 12:10
Nothing unperlish about using `unbuffer`, and I already mentioned how to do it in Perl. — ikegami, Sep 24 '14 at 12:44

score 0 · Answer 1 · answered Sep 24 '14 at 08:52

0

In the general case, your script cannot change the buffering of the child process' output. In some specific cases you may be able to do so by starting it with appropriate switches, but that's about it.

I would recommend that instead of writing your own code to do the running and reading, you re-write your script to use the IPC::Run module. It exists to solve exactly this sort of problem. The documentation isn't the best ever, but the module itself is well-tested and solid.

answered Sep 24 '14 at 08:52

Calle Dybedahl

5,228
2
18
22

How do I send the command to background while parsing the output in my main routine? Based on the documentation I tried to `start()` and `pump()` my command: `my $h = start \@cmd, \$in, \$out; pump $h; print $out;`. But instead of getting just one line, I get a whole block of lines. If I repeat to pump (without finishing) the command get re-executed instead of continued. Am I missing something? – raidlman Sep 24 '14 at 12:06
As I previously mentioned, it's the pseudo-tty creation of ability of IPC::Run that helps. Did you use it? – ikegami Sep 24 '14 at 12:45

score 0 · Accepted Answer · answered Sep 24 '14 at 14:41

Thanks to ikegami and Calle Dybedahl I found the following solution for my problem:

#!/usr/bin/perl

use warnings;
use strict;
use threads;
use threads::shared;
use sigtrap qw(handler exit_safely normal-signals stack-trace error-signals);
use IPC::Run qw(finish pump start);

# define shared variable
my $var :shared; $var="";

# define long running command
my @cmd = ('<long running command>','with','arguments');
my $in = '';
my $out = '';
# start harness
my $h = start \@cmd, '<pty<', \$in, '>pty>', \$out;

# create thread
my $thr = threads->create(
    sub {
        while (1) {
            # pump harness
            $h->pump;
            # extract some information from $out
            $var = <some value>;
            # empty output
            $out = '';
        }
    }
);

while(1) {
    # evaluate variable each ~second
    print "$var\n";
    sleep 1;
}

sub exit_safely {
    my ($sig) = @_;
    print "Caught SIG $sig\n";
    # harness has to be killed, otherwise
    # it will continue to run in background
    $h->kill_kill;
    $thr->join();
    exit(0);
}

exit(0);

Why don't you use a sub reference instead of scalar reference for the output. You wouldn't need to pump anymore. — ikegami, Sep 24 '14 at 22:37

Avoid buffering when parsing stdout with Perl

2 Answers2