4

I have a Perl script that received input piped from another program. It's buffering with an 8k (Ubuntu default) input buffer, which is causing problems. I'd like to use line buffering or disable buffering completely. It doesn't look like there's a good way to do this. Any suggestions?

use IO::Handle;
use IO::Poll qw[ POLLIN POLLHUP POLLERR ];
use Text::CSV;

my $stdin = new IO::Handle;
$stdin->fdopen(fileno(STDIN), 'r');
$stdin->setbuf(undef);

my $poll = IO::Poll->new() or die "cannot create IO::Poll object";
$poll->mask($stdin => POLLIN);

STDIN->blocking(0);

my $halt = 0;
for(;;) {
    $poll->poll($config{poll_timout}); 

    for my $handle ($poll->handles(POLLIN | POLLHUP | POLLERR)) {
        next unless($handle eq $stdin);

        if(eof) {
            $halt = 1;
            last;
        }

        my @row = $csv->getline($stdin);
        # Do more stuff here
    }

    last if($halt);
}

Polling STDIN kind of throws a wrench into things since IO::Poll uses buffering and direct calls like sysread do not (and they can't mix). I don't want to infinitely call sysread without no blocking. I require the use of select or poll since I don't want to hammer the CPU.

PLEASE NOTE: I'm talking about STDIN, NOT STDOUT. $|++ is not the solution.

[EDIT] Updating my question to clarify based on the comments and other answers.

The program that is writing to STDOUT (on the other side of the pipe) is line buffered and flushed after every write. Every write contains a newline, so in effect, buffering is not an issue for STDOUT of the first program.

To verify this is true, I wrote a small C program that reads piped input from the same program with STDIN buffering disabled (setvbuf with _IONBF). The input appears in STDIN of the test program immediately. Sadly, it does not appear to be an issue with the output from the first program. [/EDIT]

Thanks for any insight!

PS. I have done a fair amount of Googling. This link is the closest I've found to an answer, but it certainly doesn't satisfy all my needs.

Community
  • 1
  • 1
jtv4k
  • 224
  • 2
  • 7
  • http://stackoverflow.com/a/21956697/223226 – mpapec May 06 '14 at 18:38
  • Your example code wouldn't "hammer the CPU" if you changed it to read STDIN line-by-line in the usual way. CPU usage only becomes a concern when you're doing non-blocking reads so that the program can do something else while it's waiting for the data. Maybe your real program is doing something more sophisticated. – Kenster May 06 '14 at 18:58
  • Re "I don't want to infinitely call sysread without no blocking", Why not call `sysread` with blocking? `sysread` will always return as soon as data arrives. (Won't solve your problem, but it makes your program a whole lot simpler.) – ikegami May 06 '14 at 19:12
  • I can't use blocking because other tasks are handled that aren't included in the pasted code (child reaping, etc). That leaves me with the option to infinitely loop over the sysread (with no blocking) that (if I recall correctly) will peg the CPU. – jtv4k May 06 '14 at 21:18

2 Answers2

3

You're actually talking about the other program's STDOUT. The solution is $|=1; (or equivalent) in the other program.

If you can't, you might be able to convince the other program use line-buffering instead of block buffering by connecting its STDOUT to a pseudo-tty instead of a pipe (like Expect.pm does, for example).

The unix program expect has a tool called unbuffer which does that exactly that. (It's part of the expect-dev package on Ubuntu.) Just prefix the command name with unbuffer.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Yeah, you have no control over how the other program handles buffering. You'll have to fix this at the source. – tadman May 06 '14 at 19:09
  • 1
    @tadman, A few programs give you a command line option to disable buffering, but that's very rare. – ikegami May 06 '14 at 19:11
  • Thankfully, the program performing the output is line buffered. It makes an explicit flush after each line output. Each line that is written has a newline character in it. I believe it's STDIN that's buffered. – jtv4k May 06 '14 at 21:17
  • Also, I'm confident that STDOUT is flushed from the primary program because I wrote a test program in C, disabled line buffering completely on STDIN and received the full data instantly. – jtv4k May 06 '14 at 21:21
  • 1
    It could be that IO::Poll requires that you use `sysread` for the same [reason](http://www.perlmonks.org/?node_id=815969) IO::Select does. – ikegami May 06 '14 at 22:32
  • Good find! IO::Select provides the functionality I need from IO::Poll, so I'll use that with the sysread option. It's more complicated, but hopefully it will do the job. – jtv4k May 07 '14 at 00:19
  • Wasn't really a "find" since I wrote the linked post :) (You can probably find other useful IO::Select snippets by me on PerlMonks.) Also, you don't need to stop using IO::Poll; the solution for IO::Select will work for IO:Poll too. Anyway, posted a new answer. Leaving this one cause it could still be useful to some. – ikegami May 07 '14 at 00:44
3

Say there are two short lines in the pipe's buffer.

IO::Poll notifies you there's data to read, which you proceed to read (indirectly) using readline.

Reading one character at a time from a file handle is very inefficient. As such, readline (aka <>) reads a block of data from the file handle at a time. The two lines ends up in a buffer and the first of the two lines is returned.

Then you wait for IO::Poll to notify you that there is more data. It doesn't know about Perl's buffer; it just knows the pipe is empty. As such, it blocks.

This post demonstrates the problem. It uses IO::Select, but the principle (and solution) is the same.

ikegami
  • 367,544
  • 15
  • 269
  • 518