Why do C fundamental types have identifiers with multiple keywords

Question

Apart from obvious answer because, guys designed it that way, why does C/C++ have types, which consist of multiple identifiers, e.g.

long long (int)
short int
signed char

A do have some basic knowledge of parsing and have used flex/bison tools to make few parsers and I think, that this bring much more complexity to parsing type names. And looking on C++ grammar in standard, everything about types really is complicated.

I know, that C++ (also C, I believe) do not specify much about sizes of fundamental data types, thus making types int_8, uint_8, etc. would not work (Altough c++11 gave us fixed width integers).

So, why did developers of standard agreed on multi-word type identifiers, when they could make int, uint and similar.

C++ has it because it started out as an extension of C, which had it from the beginning (except `long long`, which was introduced in C99). Perhaps it's better to rephrase the question to apply to C. — , Sep 25 '16 at 10:07
It can either make the lexical analyzer more complicated, or the parser, but not the type-system itself. And even so, the parsing is not that much more complicated: The parser sees a token `long`, and then check the next token as usual. Is it an identifier? An asterisk? Another `long` token? Something else valid or invalid? No big deal really. — Some programmer dude, Sep 25 '16 at 10:10
@JoachimPileborg C is tricky to parse, since type names, rather than simple type tokens, decide the phrase structure. Unless you take a strict subset of C, you can't truly separate parsing (of declarations, for instance) and interpretation (of type definitions). — , Sep 25 '16 at 10:15
@Rhymoid It's not really that complicated, once the parser knows what it's dealing with. What's *really* complicated is that if the parser sees an identifier it can't know if it's an expression, a statement or a declaration/definition. It's really not hard in C where it's a simple lookup to see it it's a type-name or something else, but worse in C++ where one could use type-names to introduce either a variable declaration/definition *or* a cast expression *or* the creation of a temporary object. — Some programmer dude, Sep 25 '16 at 10:24
@JoachimPileborg While I agree that C++'s situation is worse (where parsing is undecidable), my point is that the fact that C's context-sensitive grammar is unusual for programming languages, making parsing C unusually complicated. — , Sep 25 '16 at 10:32

score 5 · Accepted Answer · edited May 23 '17 at 11:48

Speaking in terms of C, why did the developers of the standard agree on multi-word identifiers? It's because that was what the language had at the time of standardisation.

The mandate for the original standard was not to create a new language but to codify existing practice. As per the C89 standard itself:

The Committee evaluated many proposals for additions, deletions, and changes to the base documents during its deliberations. A concerted effort was made to codify existing practice wherever unambiguous and consistent practice could be identified. However, where no consistent practice could be identified, the Committee worked to establish clear rules that were consistent with the overall flavor of the language.

And, from the C99 rationale document:

The original X3J11 charter clearly mandated codifying common existing practice, and the C89 Committee held fast to precedent wherever that was clear and unambiguous. The vast majority of the language defined by C89 was precisely the same as defined in Appendix A of the first edition of The C Programming Language by Brian Kernighan and Dennis Ritchie, and as was implemented in almost all C translators of the time.

Beyond that, each iteration of the standard has valued backward compatibility highly so that code doesn't break. From that same rationale document:

Existing code is important, existing implementations are not. A large body of C code exists of considerable commercial value. Every attempt has been made to ensure that the bulk of this code will be acceptable to any implementation conforming to the Standard. The C89 Committee did not want to force most programmers to modify their C programs just to have them accepted by a conforming translator.

So, while later versions of the standard gave us things like stdint.h with its fixed width integral types, taking away the standard ones like int and long would be a gross violation of that guideline.

In terms of C++, it's almost certainly a holdover from the earliest days of that language where it was put forward as "C plus classes". In fact, the very early cfront C++ compiler was so named because it took C++ source code and turned that into C before giving it to a suitable C compiler (i.e., a front end for C, hence cfront).

This would have allowed the original author Bjarne to minimise the workload in delivering C++ since the bulk of it was already provided by the C compiler itself.

In terms of parsing a language, it's certainly more difficult to have to process unsigned long int x ^(a) than it is to handle ulong x.

But, given that the compiler already has to handle a large number of optional "modifiers/specifiers" for a variable (e.g., const char * const x), handling a few others is par for the course.

^(a) Or int long unsigned x or long unsigned x or any of the other type specifiers that end up becoming the singular unsigned long int type. See here for more details.

Well, there's also something to be gleaned from [Ritchie's account of C's history](https://www.bell-labs.com/usr/dmr/www/chist.html). He describes that B derived from BCPL and introduced storage-class specifiers (`auto`, `static`, etc.), and that C added type specifiers to it. Now you can have `static int`; it's speculation, but I can imagine that that gave rise to a style where it's normal to have more than one word that describes the "type" of a variable. — , Sep 25 '16 at 10:19
I understand why it was standardized this way, but I am curious why it was in the language the whole time. I assume that developers of first compiler had issues with parsing `unsigned int` so why did not they decided to use just `uint` right away? — Zereges, Sep 25 '16 at 10:37
@Zereges, you would have to ask the author of C that, certainly the earliest versions, pre-standard, had precious little in terms of types (int and char and pointers to them were about it, from memory). Unfortunately, dmr has shuffled off this mortal coil, leaving us the poorer for it, so we can only speculate on what was left behind (which isn't really that much in terms of this specific question). — paxdiablo, Sep 25 '16 at 12:00
Is his first name alone enough to uniquely identify Bjarne Stroustrup? Like Cher, Madonna, and Prince? (RIP -but I had to include at least one guy!) — Jongware, Sep 25 '16 at 12:29
@Rad, the set `{Bjarne,Ritchie,Leonard,Knuth}` (and possibly others) is suitably well known here to warrant no need for other names. That's Leonard Nimoy, by the way, effectively reducing the strength of my argument considerably :-) — paxdiablo, Sep 25 '16 at 12:35
Hah, got you there: `Knuth` is a surname! ... not saying you are wrong there. I've yet to encounter the individual asking "Do you mean *Ben* Knuth, the carpenter over on main street?" You could replace it with "Linus", if need be. — Jongware, Sep 25 '16 at 12:40
Linus gave us Linux (and Git, I think, as well) but nothing else that springs into my forebrain. Knuth gave us a sizable proportion of the entire field of computer science. I would hope the latter was more famous than the former but such is the fickle nature of fame (sic transit gloria mundi and all that stuff) :-) — paxdiablo, Sep 25 '16 at 12:42

score 0 · Answer 2 · answered Sep 26 '16 at 16:13

0

Adding new reserved words to a language will break any code which happens to use such words as identifiers unless those words are of a form which is reserved for future expansion (e.g. contain two leading underscores, or start with an underscore and a capital letter, etc.)

By contrast, if some particular sequence of reserved words has no defined meaning in any existing implementation, there can be no existing code which uses that sequence of reserved words, and thus no danger of breaking existing code by attaching a new meaning to it.

answered Sep 26 '16 at 16:13

supercat

77,689
9
166
211

New keywords are added to the languages (consider C++'s `auto`, `final`, ...) – Zereges Sep 27 '16 at 08:01
@Zereges `auto` exists in C, it just has an entirely different meaning. – Sep 27 '16 at 08:09
@Zereges: Keywords are sometimes added in breaking fashion without particularly good justification (`restrict` is probably the worst such addition in C, since it would be a logical identifier name in many contexts, and had almost certainly been used as such prior to C99 taking it over) but for the most part the Committee tries to avoid identifiers that are employed within user code. – supercat Sep 27 '16 at 14:40

Why do C fundamental types have identifiers with multiple keywords

2 Answers2