How does the C preprocessor handle circular dependencies?

Question

I want to know how the C preprocessor handles circular dependencies (of #defines). This is my program:

#define ONE TWO 
#define TWO THREE
#define THREE ONE

int main()
{
    int ONE, TWO, THREE;
    ONE = 1;
    TWO = 2;
    THREE = 3;
    printf ("ONE, TWO, THREE = %d,  %d, %d \n",ONE,  TWO, THREE);
}

Here is the preprocessor output. I'm unable to figure out why the output is as such. I would like to know the various steps a preprocessor takes in this case to give the following output.

# 1 "check_macro.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "check_macro.c"

int main()
{
 int ONE, TWO, THREE;
 ONE = 1;
 TWO = 2;
 THREE = 3;
 printf ("ONE, TWO, THREE = %d,  %d, %d \n",ONE, TWO, THREE);
}

I'm running this program on linux 3.2.0-49-generic-pae and compiling in gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5).

rici · Accepted Answer · 2014-06-13T21:35:10.110

While a preprocessor macro is being expanded, that macro's name is not expanded. So all three of your symbols are defined as themselves:

ONE -> TWO -> THREE -> ONE (not expanded because expansion of ONE is in progress)
TWO -> THREE -> ONE -> TWO (        "                         TWO      "        )
THREE -> ONE -> TWO -> THREE (      "                         THREE    "        )

This behaviour is set by §6.10.3.4 of the C standard (section number from the C11 draft, although as far as I know, the wording and numbering of the section is unchanged since C89). When a macro name is encountered, it is replaced with its definition (and # and ## preprocessor operators are dealt with, as well as parameters to function-like macros). Then the result is rescanned for more macros (in the context of the rest of the file):

2/ If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced…

The clause goes on to say that any token which is not replaced because of a recursive call is effectively "frozen": it will never be replaced:

… These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.

The situation which the last sentence refers rarely comes up in practice, but here is the simplest case I could think of:

#define two one,two
#define a(x) b(x)
#define b(x,y) x,y
a(two)

The result is one, two. two is expanded to one,two during the replacement of a, and the expanded two is marked as completely expanded. Subsequently, b(one,two) is expanded. This is no longer in the context of the replacement of two, but the two which is the second argument of b has been frozen, so it is not expanded again.

+1, excellent answer. [Here's an example](http://stackoverflow.com/questions/24177503/how-does-the-c-preprocessor-handle-circular-dependencies/24204531#24204531) that I feel nicely demonstrates this behavior (but, alas, is a bit too long for a comment). — Ilmari Karonen, Jun 13 '14 at 11:46
@IlmariKaronen: I added an example for the last sentence of paragraph 2, which is otherwise a bit tricky to understand. But rereading your comment/answer, I don't think that was what you were aiming at, so there's no need to say that your example is roughly the same as the OP, although the result is possibly a little more visual. — rici, Jun 13 '14 at 21:36

score 16 · Answer 2 · answered Jun 12 '14 at 06:26

16

Your question is answered by publication ISO/IEC 9899:TC2 section 6.10.3.4 "Rescanning and further replacement", paragraph 2, which I quote here for your convenience; in the future, please consider reading the specificaftion when you have a question about the specification.

If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Furthermore, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts in which that macro name preprocessing token would otherwise have been replaced.

answered Jun 12 '14 at 06:26

Eric Lippert

647,829
179
1,238
2,067

50

To be fair, finding and understanding the answer in the C standard is not a trivial task. With your "go read the standard" logic, we could answer every single question related to C with RTFM. – Lundin Jun 12 '14 at 06:54
13

@Lundin: The specification begins with a table of contents which clearly identifies which section of the specification is about macro expansion; it took me all of 30 seconds to find the correct paragraph, and I am no expert on the C specification. And yes, with my excellent suggestion that people *actually read the standard when they have a question about a standardized language*, most of the bad questions in this tag would go away. That's a good thing. – Eric Lippert Jun 12 '14 at 07:02
22

Except this isn't a bad question. The OP has done some research, included an example that compiles and the pre-processor output, specified the compiler and system etc. And there seems to be no obvious duplicates of the question. Again, reading the C standard is no trivial task. For example, you didn't manage to do so. You are for some reason citing a draft N1124 to ISO 9899:1999 TC2, which has since then been replaced with C99+TC2, C99+TC3 draft N1256, C99+TC3, C11, C11+TC1. Though I'm sure you know all the changes of macro rescanning through these revisions... – Lundin Jun 12 '14 at 11:46
@Lundin: Whoa! I for one totally missed the existence of TC3. Its existence shocks me, since the current ISO directive 2.10.4 prohibits more than 2 TCs. I guess that is a new rule. Anyway, if Eric also missed that, and felt that referring to a C99 document was more appropriate (having been the controlling standard for longer than C11, then his choice of link is understandable, since that draft is publically available, and is consolidated, unlike the offical TC2, which is a royal pain to use unless you use it (and TC 1) to hand correct a printed copy of C99. – Kevin Cathcart Jun 12 '14 at 13:14
1

@KevinCathcart All of this was my point: you'd have to be a complete standard nerd to keep track of all these things. Reading a technical ISO standard is not exactly a breeze, and it not something you can expect the average programmer to do. – Lundin Jun 12 '14 at 13:27
9

@Lundin: As far as bad questions go, I see 100x worse questions here every day, so yes, it's pretty good. I chose that version of the standard because it's easily found -- it's linked from Wikipedia -- and its free, and most compiler comply to it. As I said, I am not an expert at all on the history or contents of the C specification; my point is that I managed to find an answer to the question with one web search and one glance at the table of contents; this is not out of the realm of possibility for the average programmer. I encourage spec reading to be in the toolbox of all programmers. – Eric Lippert Jun 12 '14 at 13:47
1

@EricLippert I kind of agree with Lundin here. 'Average programmer' is not used to reading specifications. I think part of the reason is that specification language can be quite dense and hard to understand. E.g. what did you search to end up with section 6.10.3.4? It is not obvious to me at all. When I go to the spec, I will have to go from the beginning of 6.10.3 and read whole bunch of things before I find the relevant section. – SolutionYogi Jun 12 '14 at 14:26
On the other hand, this particular page (linked by R Sahu) https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html#Self-Referential-Macros is much easier to understand AND has the keywords I would search for. – SolutionYogi Jun 12 '14 at 14:27
@SolutionYogi "When I go to the spec, I will have to go from the beginning of 6.10.3 and read whole bunch of things before I find the relevant section" **Only the first time.** Next time you go, you'll have a much easier go of it. If it's the case that " 'Average programmer' is not used to reading specifications," you can be become a better-than-average programmer by becoming used to it. Just like coding, the first times are hard, but it gets easier. – Joshua Taylor Jun 12 '14 at 16:36
Personally I find the RTFM replies useless because if the person would be the kind to RTFM, he will RTFM by himself next time, if not your comment is unlikely to make him RTFM. – BlueTrin Jun 12 '14 at 16:49
Given that compilers almost always implement a hodgepodge of features from different standards, the fact that a particular version of a standard says something doesn't always imply that's what a compiler will actually do. If all compilers which were considered usable have always done something a certain way, and the standard says it must be done that way, odds are future compilers will keep doing it that way, but if variations have existed it may be useful to know about them. – supercat Jun 12 '14 at 17:01
@supercat: If you start with a given version of the standard, though, you know how it's *supposed* to act...and any compiler that claims to support that version, but acts significantly different, is basically broken. An answer can't cover every compiler ever made (or even every nonstandard variation of behavior), and that wouldn't really be useful in the long run anyway. – cHao Jun 12 '14 at 21:55
@cHao: True, but how many compilers are even up to C11 yet? Noting that a standard presently says something would be more helpful if one noted *when* that standard became applicable. – supercat Jun 12 '14 at 22:06
I believe this behavior was already nailed down properly in C90. This answer seems to suggest that it was fixed in a rather late TC. – Kaz Jun 13 '14 at 01:40
Isn't SO all about being an easily-referrable source for broadly applicable questions like this one? I think this is great information to have indexed here. Not everyone will (or should!) go refer to the standard when they have a question like this, nor will that do anything for people who learn a great deal by stumbling from question to question here. – Chris Hayes Jun 13 '14 at 03:45
@SolutionYogi If the average programmer isn't used to reading specifications, then the average programmer doesn't write working code; it's impossible to write properly working code (especially C) without doing so. – Alice Jun 13 '14 at 05:35
1

@Alice: It's quite possible to write code that works without knowing why it works. Probably 75% of all PHP programmers do it every day. :P C definitely presents more of a minefield, but you might be shocked at how much real, working C code out there is entirely due to dumb luck. (Blind pigs and acorns, yadda yadda.) – cHao Jun 13 '14 at 21:14
@cHao It's not possible, because there is no spec or anything to determine what "works" means. It may indeed get a result you want some of the time, but it's impossible to verify it will all the time because "what you want" is not known. In order to call something working, you *must determine what working means first*. – Alice Jun 14 '14 at 02:21
1

@Alice: That's easy, even without a formal spec. "Works" means "does what you want it to do". :P Whether it works *correctly* might be impossible to determine without formal definitions, but whether it does the job it was written for isn't -- either it does the job or it doesn't. – cHao Jun 14 '14 at 04:07
@cHao "Does what you want it to do" is not possible to ascertain; unless you have a spec (and therefore a formal "proof" of what it should do), all you can prove is "it does what I want *in these specific cases*. There is always the chance it will break *in a case you have not yet tested*. That's the problem with informal analysis and inductive reasoning; it's brittle. – Alice Jun 14 '14 at 05:52
@Alice: And if "*these specific cases*" are all you need it for, then it works. There's not always a need for a full-on spec, and there are cases where you won't have one anyway. The lack of one doesn't make it impossible to make stuff that works. If it did, a whole lot of software wouldn't even exist. – cHao Jun 14 '14 at 08:12
1

@Alice: And frankly, outside of FP, formal verification is a pipe dream. The best you're ever going to get in the real world is to say "this code does what i want in this given set/range of cases". And a huge portion of the time, you won't even get *that*. – cHao Jun 14 '14 at 08:22
@cHao That sounds both unnecessarily pessimistic and trivially false. Formal verification is trivial for a wide class of mutable concerns, and in fact is the only way to validate many classes of algorithm (notably parallel ones). Take the STL map, which is usually based off an RB tree. Given there is a formal spec (which is well proven), it is easy to derive from the inductive proofs exactly which tests should be performed, and what their results shall be. These tests prove, beyond any shadow of a doubt, that the RB tree works in every case. This is intro CS sort of stuff, not a pipe dream. – Alice Jun 14 '14 at 17:09
@Alice: Congrats. Now prove that Company X's implementation is correct, without being able to read their source code. In real-world software, you might be connecting dozens, sometimes hundreds, of other people's unproven modules together, exchanging data with other systems you have no control over, and/or running on real-world hardware that's nowhere near as well-behaved as intro CS stuff likes to assume it is. (Though I should hope your CS curriculum covered the Byzantine Generals Problem too.) Formal verification is no longer feasible. And yet, somehow, the resulting program still works. – cHao Jun 15 '14 at 03:05
@cHao Have you not heard of unit testing and black box testing? The need to read source code does not change the problem, and none of that makes formal verification unfeasible. I find it a little insulting you make assumptions, both about me and about the entire field, when functional programming and the Haskell correspondence prove that what you call impossible is not only feasible, but in many cases, trivial. The program does not work if it is not proven to work; it may clank along for many years, but that is meaningless. – Alice Jun 25 '14 at 21:12
2

@Alice: Though cHao could perhaps have expressed themselves more elegantly, the point is well taken. I am basically unconcerned with the correctness proofs of hundred line methods in library code that has a clear specification; I'm concerned about the correctness of entire operating systems, entire databases, and so on, operating in a world with weak memory models, memory-unsafe languages and so on. The techniques that you use to prove the correctness of the STL collections do not scale to the entire Windows OS. – Eric Lippert Jun 25 '14 at 22:20
@EricLippert Untrue. L4 is a completely proven kernel; while it is written in Haskell, for compilation Haskell can (and used to always, for the GHC compiler which is the forefront compiler) compile to C code (for portability reasons as well as because the LLVM didn't exist at that time). This directly implies that it is possible to use the same techniques in C, which directly implies that a kernel can be proven sound in C (perhaps by using strict quality control and code generation). This is a (loose) proof that such techniques could, in fact, scale to Windows level. – Alice Jun 28 '14 at 11:37
@EricLippert After all, Windows is merely a kernel (which is getting more microkernel-esq as time goes on) with a supporting layer of services (daemons), libraries (which you admit can be proven), and programs (many of which largely depend on libraries). As for databases, we already have a way to prove they are correct; normal forms and strict adherence to the relational algebra provide strong guarantees that the database is correct. If your database uses a language not strictly based on relations (such as SQL), and you want hard proofs, switch to a more complete database solution. – Alice Jun 28 '14 at 11:39
1

@Alice: Unit tests and black-box tests can only prove the *presence* of bugs, not their absence. If they pass, then they demonstrate that at best, the code *appears* to work in certain scenarios. They can't prove that the code actually works correctly, though, *even in those scenarios*. That's why i'm making the assumptions that i am. You are either vastly overestimating what testing can do, or are severely mistaken about what constitutes "proof". – cHao Jul 01 '14 at 21:10
@Alice: As for L4 being "completely proven", sorry, but no. *One particular kernel implementation* (seL4) has been formally verified. For single-core ARM. (SMP and x86 are unverified). And they don't apply to the really useful userspace stuff -- just the kernel proper. Which means that the instant you want to do anything besides talk to yourself, you're again running on unproven code. – cHao Jul 01 '14 at 22:20
@cHao In what way is that not completely proven? It is exactly as stated; I said nothing of userspace. Your assessment of unit tests and black box testing is hopelessly flawed; they can easily prove the absence of bugs, if they are comprehensive. This is known as a proof by exhaustion; it is the same kind of proof used in the four color theorem or many mathematical proofs. While it may be difficult to prove things, it is not impossible. You vastly underestimate what is possible, possibly because if you consider it unfeasible, you do not have to do it. – Alice Jul 03 '14 at 07:49
1

@Alice: The kernel includes multi-core (SMP) support, but the proofs don't cover it. It is available for x86, but the proofs don't cover that. That means the kernel is *by definition* not completely proven. As for testing, the problem is that a test from the outside can not be comprehensive. Let's say i wrote a `strlen` function that aborts the app if passed a sequence exactly 1555342886 bytes long. Show me a black-box test that would prove it's broken, and i'll show you a test that would either take years to run or violate the very definition of "black-box test". – cHao Jul 03 '14 at 12:51
@Alice: For reference: https://en.wikipedia.org/wiki/Unit_testing#Unit_testing_limitations – cHao Jul 03 '14 at 13:39

score 11 · Answer 3 · answered Jun 12 '14 at 06:19

https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html#Self-Referential-Macros answers the question about self referential macros.

The crux of the answer is that when the pre-processor finds self referential macros, it doesn't expand them at all.

I suspect, the same logic is used to prevent expansion of circularly defined macros. Otherwise, the preprocessor will be in an infinite expansion.

score 4 · Answer 4 · answered Jun 12 '14 at 07:06

In your example you do the macro processing before defining variables of the same name, so regardless of what the result of the macro processing is, you always print 1, 2, 3!

Here is an example where the variables are defined first:

#include <stdio.h>
int main()
{
    int A = 1, B = 2, C = 3;
#define A B
#define B C
//#define C A
    printf("%d\n", A);
    printf("%d\n", B);
    printf("%d\n", C);
}

This prints 3 3 3. Somewhat insidiously, un-commenting #define C A changes the behaviour of the line printf("%d\n", B);

score 4 · Answer 5 · edited May 23 '17 at 12:22

Here's a nice demonstration of the behavior described in rici's and Eric Lippert's answers, i.e. that a macro name is not re-expanded if it is encountered again while already expanding the same macro.

Content of test.c:

#define ONE 1, TWO
#define TWO 2, THREE
#define THREE 3, ONE

int foo[] = {
  ONE,
  TWO,
  THREE
};

Output of gcc -E test.c (excluding initial # 1 ... lines):

int foo[] = {
  1, 2, 3, ONE,
  2, 3, 1, TWO,
  3, 1, 2, THREE
};

_{(I would post this as a comment, but including substantial code blocks in comments is kind of awkward, so I'm making this a Community Wiki answer instead. If you feel it would be better included as part of an existing answer, feel free to copy it and ask me to delete this CW version.)}

How does the C preprocessor handle circular dependencies?

5 Answers5

Linked