1

When I say stdin, I am referring to the stream referred to by fd = 0.

I am taking an OS course which covers block and character devices. It specifically said that the keyboard is a character device. However, when we were shown the read syscall, we were told that the kernel doesn't care what it is reading from as long as it is a block device or a file on a block device.

This is the code we were given:

#include <stdlib.h>
#include <unistd.h>

const int BUFFSIZE = 5;

int main () {
  int fd, n;
  char buffer[BUFFSIZE];

  int stdin = 0;
  int stdout = 1;
  int stderr = 2;

  do {
    n = read (0, buffer, BUFFSIZE);
    if (n < 0) {
      write (stderr, "Error occurred\n", 10);
    } else {
      write (stdout, "Entered if\n", 20);
      write (stdout, buffer, n);
    }
  } while (n > 0);
  return 0;
}

My question is: how does Linux treat standard input (fd = 0)? Is it treated as a character device, or does the kernel do some kind of buffering (this seems likely by judging by the results I got when running the code.)

Additionally, it would be useful to know if I can use the read syscall for reading from character devices in general. If so, is the input buffered?

Radu Szasz
  • 981
  • 1
  • 9
  • 22
  • 6
    `stdin` is a *stream*, not a *device*. – Paul R Apr 11 '16 at 21:38
  • 1
    _"the kernel doesn't care what it is reading from as long as it is a block device or a file on a block device."_ This isn't quite right; there are several types of file descriptors (sockets, unnamed pipes, `signalfd`s, `eventfd`s, `epoll` descriptors, etc.) that do not have an associated file anywhere on the filesystem. What matters is that you have a file descriptor and that it supports reading. – Colonel Thirty Two Apr 11 '16 at 21:49
  • @PaulR Thank you! Edited the question :) – Radu Szasz Apr 11 '16 at 21:49
  • 1
    Further to Colonel Thirty Two's comment: I'd say the statement "the kernel doesn't care what it is reading from as long as it is a block device or a file on a block device" is so misleading as to be dead wrong. One of the principal features of Unix and every Unix-like system (including Linux) is that *the kernel doesn't care what it is reading from or writing to, period*. If you've got a valid file descriptor, at all, it probably supports reading. It doesn't matter what kind of a device it's connected to, you're not supposed to care what kind (it's not even easy to find out what kind). – Steve Summit Apr 11 '16 at 21:57
  • 1
    You can use `read()` to read from character devices in general (and other file types too). Whether the input is buffered depends on the device — terminals will be line buffered by default (but you can change programmatically if need so be), disk files will have some kernel level buffering, and so on. – Jonathan Leffler Apr 11 '16 at 22:24
  • @JonathanLeffler That's exactly what I wanted to know. Thank you very much! Can you please make that an answer so I can select it as answering my question? – Radu Szasz Apr 11 '16 at 22:30
  • You've got a good answer from Steve; use his. It covers essentially anything I've already said and adds a few details I've not mentioned. Thanks. – Jonathan Leffler Apr 11 '16 at 22:40
  • http://unix.stackexchange.com/questions/60034/what-are-character-special-and-block-special-files-in-a-unix-system – rici Apr 12 '16 at 02:15

2 Answers2

6

The kernel generally does little or no buffering on character devices.

The kernel does a certain amount of buffering when reading from files in filesystems.

You can't say what kind of a device standard input is, because it varies from process to process. By default, fd 0 is usually the user's keyboard, which is a character device. But if I say

program < file

then fd 0 is an ordinary file. If I say

program < /dev/hda0

then fd 0 is a block device. And if I worked at it I could probably manage to get fd 0 hooked up to a network socket, too.

In Linux, there's also /proc/pid/fd/0, but that's not a device, either; it ends up looking like a symlink to the actual device in /dev, whatever it is.


Addendum: whether a particular device is buffered or not really depends on how the driver for that device is written. Any given driver may or may not implement some form of buffering. Furthermore, whether or not the buffering is actually used may end up depending on other factors. (For example, the Unix terminal drivers are all line-buffered by default, but that buffering is turned off if you put the driver into "cbreak" or "raw" modes). I don't think you can make any general statements saying that character or block devices are or aren't buffered.


Addendum 2: When you start peeling back the layers, it can get pretty complicated. Unix strives mightily (and generally does a very good job) in striking the right balance between do-what-I-mean versus keep-it-simple,stupid. For example, if you've got a terminal that's not line-buffered, and you ask for 10 characters, but there are only 3 available, read() will return 3. Which is the right thing, but it suggests that there's still a buffer somewhere, where those three characters accumulated between the time they were typed and the time you read them. Furthermore, if you asked for only 3, but there were 10 available, under some circumstances I think the other 7 would get saved for you, again suggesting a fair amount of kernel-level buffering.

But in raw mode, I'm pretty sure you can lose characters if you don't read them fast enough. Switching our attention from the terminal driver to network sockets, I had thought that under certain circumstances if you do a read() on a UDP-mode socket, and the actual UDP packet is bigger than your read request, you can lose the rest of the packet there, too. [Although a commenter suggests I may be wrong.] (TCP mode sockets, on the other hand, are obviously hugely buffered!)

So, bottom line: the rules can be complicated, and the precise details definitely depend on not only the particular device driver in use, but also potentially myriad other details.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Thank you for your answer! I still have something I am not particularly clear about. Assuming I read from a character device such as the user keyboard something that is longer than 1 character, "ABC" say. Then I will have the buffer containing the string "ABC", all being returned at once. Doesn't this imply that the kernel makes use of the buffer provided in order to store the results (and therefore buffering) and is only returning from the syscall once a special character such as '\0' or '\n' is entered or the limit provided is reached? – Radu Szasz Apr 11 '16 at 22:29
  • I was updating my answer just as you were posting that comment. Yes, as Jonathan Leffler reminded us, for "tty" (teletype, i.e. keyboard) devices, the kernel generally maintains a line buffer, so yes, character device input can be buffered. – Steve Summit Apr 11 '16 at 22:32
  • If you request 3 characters via `read()`, then the behaviour still depends on the device. For example, with a terminal or pipe, if there is an `X` and a newline in the input, those 2 characters will be returned (and there'll be no null terminator or anything). If there's nothing in the input, then it will wait for data to appear. If there are 20 characters in the input, then 3 will be read, leaving the other 17 for a later call. Mixing file stream I/O (e.g. `getchar()`) with direct `read()` calls on file descriptor 0 will lead to great confusion. – Jonathan Leffler Apr 11 '16 at 22:39
  • *"hooked up to a network socket, too"* -- that's exactly what [xinetd](https://en.wikipedia.org/wiki/Xinetd) provides. As to reading from terminals, the [termios](http://man7.org/linux/man-pages/man3/termios.3.html) settings (i.e. terminal buffering and translation the kernel provides) complicate the picture a bit (how many bytes a `read()` may return) in raw mode when both `VMIN` and `VTIME` are positive. (See the description under Canonical and noncanonical mode at [man 3 termios](http://man7.org/linux/man-pages/man3/termios.3.html).) – Nominal Animal Apr 11 '16 at 22:55
  • UDP is buffered in the kernel and you don't lose data if you read less than was sent. However, once the kernel buffer is full, which will happen if you read more slowly than data coming in, then the kernel will drop incoming packets. With TCP, the kernel akso drops incoming packets if it is out of buffer space, but it has a mechanism to tell the sender how much room there is, allowing the sender to adjust its velocity. – rici Apr 12 '16 at 02:06
1

There is no real stdin in Unix. The C run-time library defines a symbol stdin that is associated with the first (0th) file descriptor for the process.

By convention, Unix shells set up three files when they create a process. Also by convention they are referred to as stdin, stdout, and stderr.

There is no requirement that a unix process have these three files. It is entirely possible for you to create your own shell that will in turn create processes without having files, 0, 1, or 2 opened.

The behavior of stdin will depend upon what type of "file" (data stream) it is associated with. Stdin could be mapped to a keyboard or it might be mapped to a file. In either case, you can read data. Only in the latter could you do an fseek.

user3344003
  • 20,574
  • 3
  • 26
  • 62
  • 1
    Traditionally, `stdin` was a compile time macro that expanded to `__iob[0]` (one or two underscores — I forget which) and there was no such thing as a symbol `stdin` in the libraries. It was quite a shock when GNU C changed the rules a decade or more ago. Both systems were compliant with the standard; the standard allows a lot of latitude to implementations. – Jonathan Leffler Apr 11 '16 at 22:34