Clarification on how pipe() and dup2() work in C

Question

I am writing a simple shell that handles piping. I have working code, but I don't quite understand how it all works under the hood. Here is a modified code snippet I need help understanding (I removed error checking to shorten it):

int fd[2];
pipe(fd);

if (fork()) { /* parent code */
    close(fd[1]);
    dup2(fd[0], 0);

    /* call to execve() here */

} else { /* child code */
    close(fd[0]);
    dup2(fd[1], 1);
}

I have guesses for my questions, but that's all they are - guesses. Here are the questions I have:

Where is the blocking performed? In all the example code I've seen, read() and write() provide the blocking, but I didn't need to use them here. I just copy STDIN to point at the at the read end of the pipe and STDOUT to point to the write end of the pipe. What I'm guessing is happening is that STDIN is doing the blocking after dup2(fd[0], 0) is executed. Is this correct?
From what I understand, there is a descriptor table for each running process that points to the open files in the file table. What happens when a process redirects STDIN, STDOUT, or STDERR? Are these file descriptors shared across all processes' descriptor tables? Or are there copies for each process? Does redirecting one cause changes to be reflected among all of them?
After a call to pipe() and then a subsequent call to fork() there are 4 "ends" of the pipe open: A read and a write end accessed by the parent and a read and a write end accessed by the child. In my code, I close the parent's write end and the child's read end. However, I don't close the remaining two ends after I'm done with the pipe. The code works fine, so I assume that some sort of implicit closing is done, but that's all guess work. Should I be adding explicit calls to close the remaining two ends, like this?
```
int fd[2];
pipe(fd);

if (fork()) { /* parent code */
    close(fd[1]);
    dup2(fd[0], 0);

    /* call to execve() here */

    close(fd[0]);

} else { /* child code */
    close(fd[0]);
    dup2(fd[1], 1);
    close(fd[1]);
}
```
This is more of a conceptual question about how the piping process works. There is the read end of the pipe, referred to by the file handle fd[0], and the write end of the pipe, referred to by the file handle fd[1]. The pipe itself is just an abstraction represented by a byte stream. The file handles represent open files, correct? So does that mean that somewhere in the system, there is a file (pointed at by fd[1]) that has all the information we want to send down the pipe written to it? And that after pushing that information through the byte stream, there is a file (pointed at by fd[0]) that has all that information written to it as well, thus creating the abstraction of a pipe?

`execve` doesn't return unless there is an error; call `close(fd[0])` *before* `execve`. — jfs, Jan 09 '14 at 23:48

Nicholas Wilson · Accepted Answer · 2014-01-10T09:16:12.443

Nothing in the code you've provided blocks. fork, dup2, and close all operate immediately. The code does not pause execution anywhere in the lines you've printed. If you're observing any waiting or hanging, it's elsewhere in your code (eg. in a call to waitpid or select or read).
Each process has its own file descriptor table. The files objects are global between all processes (and a file in the file system may be open multiple times, with different file objects representing it), but the file descriptors are per-process, a way for each process to reference the file objects. So a file descriptor like "1" or "2" only has meaning in your process -- "file number 1" and "file number 2" probably mean something different to another process. But it's possible for processes to reference the same file object (although each might have a different number for it).

So, technically, that's why there are two sets of flags you can set on file descriptors, the file descriptor flags that aren't shared between processes (F_CLOEXEC), and the file object flags (such as O_NONBLOCK) that get shared even between processes.

Unless you do something weird like freopen on stdin/stdout/stderr (rare) they're just synonyms for fds 0,1,2. When you want to write raw bytes, call write with the file descriptor number; if you want to write pretty strings, call fprintf with stdin/stdout/stderr -- they go to the same place.
No implicit closing is done, you're just getting away with it. Yes, you should close file descriptors when you're done with them -- technically, I'd write if (fd[0] != 0) close(fd[0]); just to make sure!
Nope, there's nothing written to disk. It's a memory backed file, which means that the buffer doesn't get stored anywhere. When you write to a "regular" file on the disk, the written data is stored by the kernel in a buffer, and then passed on to the disk as soon as possible to commit. When you write to a pipe, it goes to a kernel-managed buffer just the same, but it won't normally go to disk. It just sits there until it's read by the reading end of the pipe, at which point the kernel discards it rather than saving it.

The pipe has a read and write end, so written data always goes at the end of the buffer, and data that's read out gets taken from the head of the buffer then removed. So, there's a strict ordering to the flow, just like in a physical pipe: the water drops that go in one end first come out first from the other end. If the tap at the far end is closed (process not reading) then you can't push (write) more data into your end of the pipe. If the data isn't being written and the pipe empties, you have to wait when reading until more data comes through.

Okay, I understand the first 3 answers. I'm still confused on the 4th one however. It sounds to me that calling this a pipe is a little misleading. It seems akin to Person A dumping water in a bucket, and Person B filling up their cup from that same bucket that Person A dumped water into. Ends on the pipe are just abstractions - there are no ends, just designated memory backed files to push data and pull data from the same source. Is this a correct understanding? Also, where is the buffer located if not on the disk? Memory? The cache? — instagatorTheCheese, Jan 10 '14 at 04:49
Actually, I lied about fully understanding #3. How come I can get away with not closing all the ends of the pipe? It seems that I should in a perpetual state of blocking. Once the data is read from the buffer, is it removed? Because if so, then shouldn't the kernel just go back to blocking since it's an empty buffer? How can execution continue without closing everything? How am I, as you put, "getting away with it"? — instagatorTheCheese, Jan 10 '14 at 05:39
Answer 2,3,4 expanded. Why shouldn't execution continue without closing everything? On which line of your code would you expect execution to pause? The processor just carries on executing, line by line; execution won't stop until the kernel pauses you when you make a system call it can't reply to immediately (eg you say "give me some data" and there isn't any, so it pauses you until it can reply with some data). — Nicholas Wilson, Jan 10 '14 at 09:19

Nicola Musatti · Answer 2 · 2014-01-10T10:38:34.970

2

First of all you usually call execve or one of its sister calls in the child process, not in the parent. Remember that a parent knows who its child is, but not vice-versa.

Underneath a pipe is really a buffer handled by the operating system in such a way that it is guaranteed that an attempt to write to it blocks if the buffer is full and that a read to it blocks if there is nothing to read. This is where the blocking you experience comes from.

In the good old days, when buffers were small and computers were slow, you could actually rely on the reading process being awoken intermittently, even for smallish amounts of data, say in the order of tens of kilobytes. Now in many cases the reading process gets its input in a single shot.

edited Jan 10 '14 at 10:38

answered Jan 10 '14 at 00:02

Nicola Musatti

17,834
2
46
55

Gotcha. I didn't think about that for `execve`. Also, thanks for the clarification on where the blocking comes from - that was really stumping me! It was pretty much the only thing Nicholas Wilson left out of his answer. – instagatorTheCheese Jan 10 '14 at 04:53
Also, does that mean that `read()` and `write()` don't actually do any blocking themselves? Is it just the kernel that blocks reading from an empty buffer and writing to a full buffer? – instagatorTheCheese Jan 10 '14 at 05:09
`read()` and `write()` are part of the kernel, so yours is sort of a moot question. In any case that's the standard way of working for those functions: when you write to a disk file your process is likely to produce output more quickly than the disk is able to take. In an analogous way in an interactive, text based program when you read user input your process is made to wait until your user does insert some characters, possibly until `return` is pressed. – Nicola Musatti Jan 10 '14 at 10:34

Clarification on how pipe() and dup2() work in C

2 Answers2