0

I have a problem finding out how big is the dimension of the stdin through a pipe. I know that a lot of you will be furious at this question, but just hear me out.

Half of it already works:

$ echo "BYE" | ./my_prog

In the linux shell outputs 4 which is exactly what I want.

The problem comes out when I try to feed it some bytes, in fact the first time works while after it doesn't work anymore.

$ ./create_bytes.py -n 200 | ./my_prog
$ 200
$ ./create_bytes.py -n 200 | ./my_prog
$ 0

and I can't understand why. I'm sure the stream is always the same length.

The code I'm using is the following

int main (int argc, char *argv[]) {
    struct stat fd_s;
    if (fstat(STDIN_FILENO, &fd_s) == -1) {
        perror("fstat(fdin)");
        exit(EXIT_FAILURE);
    }
    printf("%lld\n", fdin_stat.st_size);
    ...
}

Thanks in advance

EDIT: This is the actual request: Read a stream of lines (bytes sequence that terminates with \n) from stdin in 16 bytes blocks. Every line can't be bigger than 128 bytes.

Maybe I'm just making it more difficult than it should be? I hope it can help Thanks

  • What is the content of the generated byte buffer which fails? – Jib Oct 04 '22 at 18:30
  • Are the first and second buffer the exact same content? It would be great to have you post it within the question. – Jib Oct 04 '22 at 18:36
  • 3
    This looks like an XY Problem to me. – Dúthomhas Oct 04 '22 at 18:43
  • 1
    Your code does not compile. – Cheatah Oct 04 '22 at 18:50
  • I'm not "furious" at this question, but I will tell you bluntly what you do not want to hear: your code will never work. `fstat` will never give you a reliable `st_size` when applied to a pipe like this. – Steve Summit Oct 05 '22 at 03:15
  • If you were to invoke `./create_bytes.py -n 100000000 | ./my_prog`, how would you expect it to work? – Steve Summit Oct 05 '22 at 03:17
  • @SteveSummit It should print out 100000000 I guess – Thomas Perticaroli Oct 05 '22 at 08:14
  • When you open a water tap... how much water do you expect to flow out of it? Note: you don't know where the tap is connected to; may be a bottle, may be a jug, a reservoir, the ocean, ... *A C stream (like `stdin`) works just like the water tap* – pmg Oct 05 '22 at 08:31
  • It's more like: 5 bytes flowed out of it or 100000000, no? – Thomas Perticaroli Oct 05 '22 at 08:33
  • If you request 5 bytes and receive 5 bytes, you have absolutely no idea if that was it or if there are 9999999995 more bytes behind the tap. – pmg Oct 05 '22 at 08:40
  • yes, but I just want to know that those are 5 bytes – Thomas Perticaroli Oct 05 '22 at 08:54
  • @ThomasPerticaroli *It should print out 100000000 I guess* No, it can't. See the answer I just posted. – Steve Summit Oct 05 '22 at 13:41
  • @ThomasPerticaroli *yes, but I just want to know that those are 5 bytes* If you call `fread(buf, 1, 5, STDIN_FILENO)`, and if `fread` returns 5, that tells you that there were *at least* 5 bytes available on standard input. `fstat` has nothing to do with it. But as pmg said, you have no way of predicting, in general, what the next call to `fread` might return. – Steve Summit Oct 05 '22 at 13:46

3 Answers3

0

If the input is a pipe, it doesn't have a size. It's a stream that in principle can go on forever. The fact that the first time you ran it it gave you a number is not something you can rely on.

If you want to read everything from stdin into memory, you need to read data in a loop, and have a buffer that you realloc() when it is full and there is still more data to be read.

If you need to read in a text file and are going to process it line by line, you can consider using the POSIX function getline(), or you might even read a whole file with getdelim() if you are sure it doesn't contain a given delimiter.

G. Sliepen
  • 7,637
  • 1
  • 15
  • 31
0

You've run into an ill-defined corner case. POSIX specifies that fstat returns the struct stat info of a file associated with a file descriptor. But what happens when the file descriptor does not correspond to a file it not really defined. You might expect the stat call to return an error (and I'm sure there are some systems that do so), but on most systems it returns some information about the object the file descriptor refers to. What info depends on the OS and the type of the object.

On Linux with a pipe (the case you seem to be using) it will always return st_size = 0 (which implies you are using something other than Linux). I would imagine there are systems that return with st_size set to the amount of data buffered in the pipe, as that seems a useful piece of information. Your results seem consistent with that.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • *You've run into an ill-defined corner case.* I wouldn't call it a corner case - it's more like asking "What does the number `5` smell like?" It's just not applicable. – Andrew Henle Oct 05 '22 at 00:16
  • @AndrewHenle: That's in keeping with "it should return an error". The fact that it doesn't in the POSIX spec is what is ill-defined. – Chris Dodd Oct 05 '22 at 20:12
0

In a comment I asked

If you were to invoke ./create_bytes.py -n 100000000 | ./my_prog, how would you expect it to work?

and you replied

It should print out 100000000 I guess

So let's think about this, and ask: How could this possibly work?

The create_bytes.py script is going to write 100,000,000 bytes. Where do they go? Into the pipe.

But what happens over in my_prog? It doesn't actually read any characters from the pipe, it just asks, what is the "size" of the pipe?

But if create_bytes.py has written 100,000,000 characters, and if my_prog hasn't read them, where are they? Are they all "in the pipe"? And the answer is, no, they are not.

Pipes have a finite capacity. If they fill up, and if the reader doesn't read characters out fast enough, the operating system automatically puts the writing process to sleep. The writing process isn't woken up again, isn't given the opportunity to write any more characters, until some empty space has cleared up in the pipe for it to write into again.

My point is that if pipes have a finite capacity (as I assert that they do), it's impossible for the example I posed to print "100000000", for the simple reason that there is no piece of code, anywhere, that can possibly read and count those characters.

You might imagine that fstat ought to read and count them in this situation somehow, but (a) it doesn't and (b) it couldn't. If fstat read characters from the pipe so it could count them, the characters would be gone. If your program then tried to read them (perhaps down below the ... you had in your code fragment), it wouldn't be able to read them, and that would be Wrong.

But, to convince yourself, I encourage you to try that invocation

./create_bytes.py -n 100000000 | ./my_prog

and see what you get. I'll bet you $100 you don't get "100000000", but the result you do get might be interesting.

I don't have your create_bytes.py script, so instead I tried

yes | a.out

yes is a standard Unix program that prints "y" an infinite number of times. a.out was where I'd just compiled your test program, after fixing it up a bit. And, on my machine, it printed

65536

So evidently, on my machine, when fstat is called on a file descriptor that's connected to a pipe, fstat fills in st_size with the size of the contents of the pipe, and on my machine, pipes evidently have a capacity of 65536, which is of course 216.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103