21

I've been using a fair amount of file descriptors recently, and I've been wondering why they're implemented as integers?

It means that they're easy to confuse for other integers, and there's no way of knowing without context what they are, what they point to, whether they're open, etc.

In C, FILE is an opaque struct type. Many people also typedef e.g. status_t as an integer so their function is obvious. It seems the best thing would be to either implement them as an opaque type, or (e.g. in C++) as a class that can take care of some of the implementation, and also clean up the namespace a bit (a call to pipe() or open() seems so innocuous, and it's not obvious what you're piping or opening without context). Like e.g. std::file_descriptor, with constructors/factory functions for creating pipes or opening files and so on.

I hope this is on topic for this site; I've tried to phrase it as "Why was this particular decision made?" If anyone knows somewhere it'd fit better, please let me know.

0decimal0
  • 3,884
  • 2
  • 24
  • 39
allicoder
  • 285
  • 1
  • 7
  • 11
    Why? Because POSIX doesn't define a C++-specific API. – R. Martinho Fernandes Jul 22 '13 at 13:13
  • Do you mean: _why doesn't C++ redefine the C API with a typedef_, or _why does the C API not use a typedef in the first place_? – Useless Jul 22 '13 at 13:14
  • File descriptors need to be universal, `int`s are pretty much as universal as you can get for data types. – daniel gratzer Jul 22 '13 at 13:17
  • If you want a specific type for file descriptor, you can `typedef` it. – GuLearn Jul 22 '13 at 13:19
  • "two billion concurrently-open files should be enough for everybody" ? – FrankH. Jul 22 '13 at 13:39
  • 4
    Neither C nor C++ have any filedescriptor defined; it is POSIX that currently standardizes this concept (and btw. it may or may not exist on any given OS) – PlasmaHH Jul 22 '13 at 13:40
  • 1
    @FrankH. Except that when it was fixed, `int` was only 16 bits (and the internal implementation limited it to 19 files). – James Kanze Jul 22 '13 at 14:14
  • @Useless: A `typedef` doesn't create a new type; it merely creates a new name for an existing type. `typedef int filedes_t` wouldn't necessarily solve any of the problems the OP is asking about. – Keith Thompson Jul 22 '13 at 15:38
  • @KeithThompson true, but OP specifically mentioned `typedef` _as well as_ opaque structs. – Useless Jul 22 '13 at 16:25
  • FILE is spelled in upper case because originally it was `#define FILE struct _iobuf` or something like that. Not too opaque, not too transparent. – Kaz Jul 22 '13 at 18:56

3 Answers3

27

History, if nothing else. Back in the 1970s, it probably didn't seem like a problem to just use int (and the value was, in fact, an index into a fixed size table). Later, changing it to another type would have broken code.

wallyk
  • 56,922
  • 16
  • 83
  • 148
James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 10
    And it still doesn't seem to be a problem :) –  Jul 22 '13 at 13:36
  • 4
    @VladLazarenko It's true that I can't remember ever having seen an error due to someone doing something like `++fd`, just because `fd` was an `int`. – James Kanze Jul 22 '13 at 14:13
  • @JamesKanze - I have. It was a "close all the open file descriptors" function, which iterated over what had been a limited number of file descriptors. They someone decided that since there was a resource limited on the number of file descriptors, hey, change the maximum possible file descriptor to MAX_INT. What could go wrong with that?!? – Julie in Austin May 03 '19 at 20:48
19

Your question can be divided in two:

Why is POSIX file descriptor int?

Like most of things in already established tools and libraries, the answer is probably historical reasons. James' answer points this out.

Making the file descriptor opaque is probably a good idea, but not for the reason you mentioned. Making the type opaque is good for having a different type based on some parameters. For example, on some systems you may want a long long as file descriptor. However, as it seems to happen, no one nowhere has needed 2 billion open files at the same time and therefore no one has cared to fix this non-existing problem.

On the other hand, such a thing as typedef int file_descriptor; won't fix any of the problems you mentioned above:

It means that they're easy to confuse for other integers...

If you confuse your variables, I have bad news for you. The compiler won't help you either since file_descriptor and int are the same type, so any operation on one is allowed on the other.

... and there's no way of knowing without context what they are, what they point to, whether they're open, etc.

You can't do that with FILE either. That's why you have functions that query the information you seek and return it, just like with FILE. typedefing the type won't give you any extra information.

Why doesn't POSIX have a C++ wrapper for it?

In short, because except Microsoft, no operating system developer is in love with C++. Windows is barely even POSIX, so there is no hope for Microsoft in trying to improve anything POSIX. Other operating systems which are POSIX-compliant have a C API as the de facto system programming language (partly because as n.m. says, almost all languages can bind to C).

In fact, C++ is popular among application developers, but not as much among system programmers. The POSIX committee doesn't seem to be particularly interested in C++ either. That's why you'd see C and only C solutions and arguments with respect to the POSIX API. Also note that POSIX was created to standardize UNIX's interface in particular, which was written in C and one of its most important descendants, Linux, is also strongly bound to C.

Community
  • 1
  • 1
Shahbaz
  • 46,337
  • 19
  • 116
  • 182
  • 4
    Another point: when UNIX was first developed, there is no C++ yet. Bjarne was in Europe at that time. – xis Jul 22 '13 at 15:43
  • 2
    Most sources agree Java is more popular than C++. Why should POSIX support C++ before Java? (It shouldn't; it defines APIs that *any* language can bind to). – n. m. could be an AI Jul 22 '13 at 16:24
  • @n.m. Which sources? If I look around, Java seems almost moribund except in Web servers and small embedded applications in smart phones. And Java creates its own environment, independently of the surrounding OS. And from what I know, C is mainly used in Linux kernels for historical reasons. Except for Linux, most new Linux development is in C++. (In the Rationale for `pthread_cleanup_push`/`pthread_cleanup_pop`, Posix even states that the ideal solution would involve exceptions--a strong indication that they'd like C++.) – James Kanze Jul 22 '13 at 16:54
  • [This Wikipedia article](http://en.wikipedia.org/wiki/Measuring_programming_language_popularity) lists several language popularity surveys, Google has more. "Java creates its own environment" --- any language does to some extent. C and C++ have their language-specific I/O facilities too but POSIX mandates a set of its own. "C is mainly used in Linux kernels for historical reasons." -- Perhaps it may have to do with Linus [despising C++ and C++ programmers](http://harmful.cat-v.org/software/c++/linus). "a strong indication that they'd like C++" -- or maybe Objective C, who knows? – n. m. could be an AI Jul 22 '13 at 17:23
  • + to nice answer for nice badge. This is really an awesome answer. Ahaa I noticed you are Phd! that the reason depth knowledge. – Grijesh Chauhan Jul 22 '13 at 18:50
  • They may not be big enough to count but BeOS and its open source offspring like Haiku are in love with C++. – Zan Lynx Jul 22 '13 at 20:26
  • @JamesKanze, C is not used in Linux for historical reasons. I'm sure if they wanted to redo it, they would still do it in C. Partly because Linus feels strongly against C++, but that's not without merit either. C seems to be at the exact sweet spot between low level and high level programming. It's low level enough to give you full power yet high level enough not to be a nightmare. With C++, you _could_ write low level code, but then it would just be reduced to C. C is generally accepted as an appropriate language for system programming, but there is no such wide agreement regarding C++. – Shahbaz Jul 23 '13 at 08:49
  • @n.m. The Wikipedia article starts by explaining exactly why those surveys are irrelevant, and don't provide any real answer to the question. (And I meant to say "C is mainly used in _Unix_ kernels for historical reasons". Linus doesn't like C++, which is why it isn't used in Linux, but that isn't true for other Unix shops, where C++ is very appreciated.) – James Kanze Jul 23 '13 at 10:30
  • @Shahbaz I disagree that there is no such wide agreement. Linus is a exception: in most large Unix developments, C++ is preferred for new code. – James Kanze Jul 23 '13 at 10:33
  • @JamesKanze, perhaps the fact that the few of us here can't reach an agreement with respect to how suitable C++ would be for system programming would be a hint why there is no such wide agreement ;) (And if you still disagree, try to find a C++ compiler for a microcontroller) – Shahbaz Jul 23 '13 at 12:14
  • @Shahbaz: _"try to find a C++ compiler for a microcontroller"_ Just a simple googling shows many possibilities... – masoud Sep 27 '13 at 21:31
2

The Unix interface is described in terms of the C language, but it is equally important that systems have an ABI, not only an API.

Fancy data structures specific to a programming language complicate the ABI. At the ABI level, you have only low level data types like "32 bit unsigned integer" and simple aggregates thereof.

That being said, the Unix interface does make use of types like pid_t and whatnot, so why not one of these typedefs for file descriptors?

File descriptors have certain well-known values, and when a new file descriptor is opened, the smallest positive value which is available is always used. File descriptor values effectively acts as array indices into a table of descriptors, and the design is deliberately that way. The programmer's model of file descriptors is that there is array-like structure of them in the kernel. The dup2 function can actually duplicate a file descriptor from one slot to another. Such array indices might as well be int, with a negative value for signaling errors.

C typedefs do not buy additional type checking, but they do bring in a little bit of readability and also abstraction: independence from a particular integer type. A fd_t type could be an int on one system and a long on another. But since int has grown to be almost ubiquitously 32 bits wide several decades ago, there is no real need to abstract it for the sake of being able to make it wider under the same name. It's very unusual for a program to need more than two billion open file descriptors.

By contrast, it would be very inconvenient for implementors if a plain int were used instead of, say, pthread_t.

int descriptors did prove to be difficult to swallow for the designers of the Windows Socket API, who invented a SOCKET typedef, whose values are not the lowest available positive integers; just one of the quirks leading to portability annoyances. However, there is a real semantic difference there in that code which relies on these descriptors being small values in a range will either not work or behave inefficiently.

There are historic instances of Unix having been revised to replace a plain int type with some typedef. For instance, in an accept function, the size of the remote address structure used to be just int. Then it became socklen_t. There is no technical need for socklen_t to exist; it was invented as a band-aid solution to bridge the differences between systems that used the traditional int and ones whose maintainers zealously changed the argument to use size_t. While those two types led to the same ABI, there was no problem, until systems with 64 bit size_t, and 32 bit int.

Community
  • 1
  • 1
Kaz
  • 55,781
  • 9
  • 100
  • 149
  • 1
    Can you link to where you took the quote out of? Reading the rest of the article may be interesting. – Shahbaz Jul 23 '13 at 08:42