It may sound obvious, but just out of curiosity: is program translation direction well-defined (i.e. top-to-bottom, left-to-right
)? Is it explicitly defined in the standard?

- 5,392
- 4
- 17
- 36
-
Not sure I understand the question: but **no**. The compiler can do all sorts of *'treachery'* --- it can, for example, replace repeated code in one or more functions by a function it created (a function absent from the source code). – pmg Nov 10 '20 at 19:40
-
The standard defines how a program should look like in order to be properly translated and its semantics. It's up to the compiler implementation how to make it conform to the corresponding abstract machine. – Eugene Sh. Nov 10 '20 at 19:41
-
1The specification shows the *grammar* for the language. The "direction" depends on this grammar. – Some programmer dude Nov 10 '20 at 19:42
-
It is not explicitly specified. As stated in footnote 6 in [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf): "Implementations shall behave as if these separate [translation] phases occur, even though many are typically folded together in practice. Source files, translation units, and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation." – John Bode Nov 10 '20 at 19:43
-
There are however many [*phases* of translation](https://en.cppreference.com/w/c/language/translation_phases), which could be seen as doing multiple passes over the source. – Some programmer dude Nov 10 '20 at 19:43
-
There are single pass compilers – 0___________ Nov 10 '20 at 19:44
-
The grammar is designed to work with an LR parser so it would assume left to right, right recursive. So probably, but as pmg says you can't be sure that a particular compiler would work that way. – Jon Guiton Nov 10 '20 at 20:17
1 Answers
A source file, and the translation unit that results from including headers via the #include
directive, is implicitly a sequence of characters with one dimension. I do not see this explicitly stated in the standard, but there are numerous references to this dimension in the standard, referring to characters “followed by” (C 2018 5.1.1.2 1 1) or “before” (6.10.2 5) other characters, scopes begin “after” the appearance of a tag or declarator (6.2.1 7), and so on.
A compiler is free to read and compute with the parts of translation units in any order it wants, but the meaning of the translation unit is defined in terms of this start-to-finish order.
There is no “up” or “down” between lines. Lines are meaningful in certain parts of C translation, such as the fact that a preprocessing directive ends with a new-line character. However, there is no relationship defined between the same columns on different lines, so the standard does not define anything meaningful for going up or down by lines beyond the fact this means going back or forth in the stream of characters by some amount.
The standard does allow that source files might be composed of lines of text, perhaps with Hollerith cards (punched cards) in which each individual card is a line or with fixed-length-record files in which each record is a fixed number of bytes (such as 80) and there are no new-line characters physically recorded in the file. (Implicitly, each line of text ends after 80 characters.) The standard treats these files as if a new-line character were inserted at the end of the line (5.2.1 3), thus effectively converting the file to a single stream of characters.

- 195,579
- 13
- 168
- 312
-
I remember looking long and hard for this in the standard too, but without finding it actually specified anywhere. Which is odd, since a lot of things depend on it. Such as `#define X... #undef X` etc. I'm not even sure there's a left or right order specified, though things like "the maximal munch rule" (6.4/4) would surely break if the order wasn't left to right. – Lundin Nov 13 '20 at 12:14