I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...
-
2What OS are you talking about? – Billy ONeal Aug 10 '09 at 04:44
-
I'm not looking at any particular OS. Linux/Unix types would be just fine. I just want a basic understanding of the situation. – jetru Aug 10 '09 at 05:18
-
1They are typically not part of the OS - but part of the C library (libc, glibc, ...). scanf() uses the POSIX read() and printf the POSIX write() ultimately. Some standard Unix programming books should help at this level such as Advanced Programming in the UNIX Environment (APUE). – dajobe Aug 10 '09 at 05:39
4 Answers
scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.

- 7,676
- 1
- 30
- 52

- 6,159
- 2
- 27
- 25
-
The context switch sounds slow if it reads each byte individually. Of course, it doesn't really matter in this day and age, but I'm just interested to know if I'm correct in this understanding. – Ray Hidayat Aug 10 '09 at 04:52
-
1The read and write syscalls take a number of bytes to transfer via the UIO as parameters, so it doesn't have to make a separate syscall for every single byte. You would also think that for simpler input functions like getchar() there'd have to be a separate call for each character, but in fact these days libc is a bit cleverer than that and it keeps a buffer (inside libc). So it can avoid the performance overhead of context switching alot by filling up it's buffer, then processing a bit of that each time you getchar() or scanf(), until the buffer is empty, and only then make another syscall. – David Claridge Aug 10 '09 at 04:56
-
On the topic of "it doesn't really matter in this day an age", in fact you'd be surprised how significantly making syscalls all the time would impact performance. Consider that syscalls are *at least* 10 times as slow as a regular function call. If you have a buffer of size 1024 bytes, for example, you are only making 1/1024 as many syscalls. Writing a C implementation of the 'cp' command with and without a buffer is a great example, I'll post it in a few minutes. – David Claridge Aug 10 '09 at 04:59
-
I don't know how MS Windows works either. No, seriously. It's a miracle it works at all :-) – paxdiablo Aug 10 '09 at 05:20
-
Wow, this sounds good. How does the OS transfer the bytes from the keyboard to its UIO buffer? The read() and write() calls do that, but from where? Where does the bytes come from the keyboard? The keyboard driver? – jetru Aug 10 '09 at 05:22
-
read() don't know about the keyboard, it's at a slightly higher layer of abstraction, it just knows about the device node it talks to, such as a console device. The driver for that device will provide a node in the file system that read() can talk to, and it's the driver that has to be able to actually get characters out from the hardware. – David Claridge Aug 10 '09 at 05:27
-
-
Just one detail - it's not safe to assume that read() / write() are used internally. For one, they are POSIX functions, and as such aren't part of the C standard, so depending on the platform completely different low-level functions may be called. – DevSolar Aug 10 '09 at 05:58
-
@DevSolar, absolutely, I was just using a typical Linux system as an example. – David Claridge Aug 10 '09 at 06:08
-
1Luckily for the performance of a lot of simple C code, stdio functions like `scanf` and `printf` don't call `read` or `write` every time. They (usually, I'm simplifying) actually maintain a buffer that they fill with the system call as rarely as they can to reduce the number of switches in and out of the kernel. A typical Windows libc implementation works much the same way, but uses the `ReadFile` and `WriteFile` system calls. The details inside the kernel are different, but the basic abstractions and overall data flow are very similar. – RBerteig Aug 10 '09 at 06:42
-
@DevSolar Ofcourse, it differs based on platform. *nix is just a good place to start. I just wanted the idea. It would be cool if someone could add details about what the OS is doing and what the drivers are doing at a more lower level. :) – jetru Aug 10 '09 at 10:12
-
@ jetru - scanf() is completely ignorant of what the lower levels of the OS do. It's just a wrapper for fscanf( stdin, ... ). That in turn can be implemented in terms of fgetc() calls. That function, in turn, reads from the stream buffer, and if the buffer is exhausted, triggers some OS-specific function to replenish the buffer. (On POSIX systems, read().) What that function does, on the driver level... that's depending on the type of stream (file, terminal), and the OS kernel, and would be an issue only if you want to do *OS* development. Are there any *specific* questions you have there? – DevSolar Aug 12 '09 at 10:50
-
@DavidClaridge: very nice answer. I want to know about how they work MS windows OS. – Destructor Jun 19 '16 at 11:24
On my OS I am working with scanf and printf are based on functions getch() ant putch().

- 10,336
- 3
- 34
- 56
-
Wow that was something. I managed to change output from uart to TCP/IP when there was single client connected. It was simple non preemptive cooperative microkernel for embedded system. – Luka Rahne Aug 10 '09 at 21:38
I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.

- 2,356
- 1
- 16
- 23
-
This is the high level abstraction. I wish to know the details of how these streams work with hardware and how the OS manages all the data. – jetru Aug 10 '09 at 05:23
scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

- 1
- 1