Linux unbuffered reads from STDIO

Question

I'm trying to rewrite a duplicate of wc -l that displays partial results, as it receives input (for example,

My current version is a simple

    while(!feof(in) &&
            //(readc=fread(buf, 1,BUFSIZE,in))) {
            (readc=read(0,buf, BUFSIZE))) {
            for(i=0;i<readc;i++) {
                    lines += (buf[i] == '\n');
            }
    }

The problem is that my stdin is still getting block-buffered. The entire point of this exercise is to have output not have to wait for each 4KB block to fill. I suppose line-buffering would be fine.

Example application: find | partial_wc

awk 'NR%1000==0 {printf "%d\r",NR} END {print NR}' has a similar output, except that I would like to choose to output based on time (every 1s, for example), rather than rows. Also, it's an interesting learning question.

I tried taking the advice given in why grep is fast, but can't figure out which set of system calls to use.

fflush appears to "For input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application." That doesn't sound like it will do what I'm looking for. — zebediah49, Jul 25 '13 at 03:48
Note that `while (!feof(x))` is pretty much guaranteed to be wrong. — caf, Jul 25 '13 at 05:27

score 4 · Answer 1 · answered Jul 25 '13 at 03:55

4

Sure, try the POSIX terminal control API:

#include <termios.h>

struct termios ctrl;
tcgetattr(STDIN_FILENO, &ctrl);
ctrl.c_lflag &= ~ICANON; // turning off canonical mode makes input unbuffered
tcsetattr(STDIN_FILENO, TCSANOW, &ctrl);

answered Jul 25 '13 at 03:55

Did not make the desired change. I assume that's because this changes the terminal control, and my input is from a previous process? My canonical test is comparing `find | my_wc` against `unbuffer find | my_wc` -- the first returns 4096 bytes every time, while the second returns <4096 most times. – zebediah49 Jul 25 '13 at 04:14
@zebediah49 Sorry, I don't understand what you mean by "previous process". Yes, this changes the terminal control, but isn't that what you want? How else would you turn off buffering if not in the terminal? – Jul 25 '13 at 04:23
I was referring to it as `previous_process | this_process`. I assumed that the terminal changes affected 'in-terminal' (as in, keyboard-based input). This is supported by the result that your response does the stated goal when stdin is connected to my keyboard, but it does not when stdin is connected to `find`. – zebediah49 Jul 25 '13 at 04:30
This would not work. The question is clearly for the case where stdin is piped from the output of another program, but your answer is for the case where stdin is interactive. – This isn't my real name Jul 25 '13 at 04:30
@zebediah49 If this doesn't work no matter what, then you can't do this, sorry. That's what Elchonon Edelson's answer implies too. – Jul 25 '13 at 04:43

score 2 · Answer 2 · answered Jul 25 '13 at 04:29

2

The problem is not that your stdin is being block-buffered, the problem is that the stdout of the process generating your data is being block-buffered. If you're controlling the entire process chain of your data pipe, you can use unbuffer to work around that, but in the general case, there's no way for your program to change the buffering of the output stream of the previous program in the pipe.

answered Jul 25 '13 at 04:29

This isn't my real name

4,869
3
17
30

Well this seems like the canonical answer then. I thought there should be a way though, given that my test-case, `find | unbuffer -p cat | my_wc` has the desired behavior (`find | my_wc` does not). – zebediah49 Jul 25 '13 at 04:31
Admittedly, your stdin may be block-buffered as well, but that isn't the cause of the problem. Incidentally, you're mixing two different types of I/O here: the `feof()` function goes with `fread()` (C-library buffered I/O), not with `read()` (POSIX I/O). – This isn't my real name Jul 25 '13 at 04:37

Linux unbuffered reads from STDIO

2 Answers2