100

The following code compiles without problems:

int main() {
    printf("Hi" "Bye");
}

However, this does not compile:

int main() {
    int test = 0;
    printf("Hi" (test ? "Bye" : "Goodbye"));
}

What is the reason for that?

José D.
  • 4,175
  • 7
  • 28
  • 47
  • 95
    String concatenation is part of the early lexing phase; it's not part of the expression synatx of C. In other words, there is no *value* of type "string literal". Rather, string literals are lexical elements of the source code that form values. – Kerrek SB May 16 '16 at 17:12
  • 24
    Just to clarify @KerrekSB answer - concatenation of the strings is a part of preprocessing of the code text *prior* compiling it. While the ternary operator is evaluated in the runtime, after the code is compiled (or in case everything is constant it can be done in the compile-time). – Eugene Sh. May 16 '16 at 17:15
  • Where would the result of the concatenation be stored? How long would it remain valid? – David Schwartz May 16 '16 at 17:41
  • There is no builtin dynamic string concatenation operator in C. If you want to dynamically concatenate strings, you must allocate storage for them, then use something like `strcat` or `sprintf`. – Tom Karzes May 16 '16 at 18:32
  • 2
    Detail: In this post, `"Hi"` and `"Bye"` are _string literals_, not _strings_ as used in the C standard library. With _string literals_, the compiler will concatenate `"H\0i" "B\0ye"`. Not the same with `sprintf(buf,"%s%s", "H\0i" "B\0ye");` – chux - Reinstate Monica May 16 '16 at 19:18
  • If you really want to concatenate using a constant for control, you can do it via `#define`. And here, the value of `test` is actually unknown to the compiler, since this is not a constant, but a variable. – trolley813 May 16 '16 at 19:20
  • 15
    More-or-less the same reason you can't do `a (some_condition ? + : - ) b` – user253751 May 17 '16 at 10:15
  • 4
    Note that even `printf("Hi" ("Bye"));` won't work — it doesn't require the ternary operator; the parenthesis is sufficient (though `printf("Hi" test ? "Bye" : "Goodbye")` also wouldn't compile). There are only a limited number of tokens that can follow a string literal. Comma `,`, open square bracket `[`, close square bracket `]` (as in `1["abc"]` — and yes, it is gruesome), close round bracket `)`, close curly bracket `}` (in an initializer or similar context), and semicolon `;` are legitimate (and another string literal); I'm not sure there are any others. – Jonathan Leffler May 17 '16 at 19:43
  • @JonathanLeffler: You missed `L`, as in `L"Wide " L"String"`, ` + ` (as in `"Hello" + 1` which is a pointer to `"ello") and `-` (same as +), '==' obviously, and possibly di/trigraphs equivalents of all these. – MSalters May 18 '16 at 08:54
  • @MSalters: Thank! Yes, I didn't think of `L` as able to follow a string literal because I was thinking that it (and the C11 `u`, `U` and `u8` prefixes) come before the string, not after it. But if you have `L"long" L"short"` as adjacent (wide character) string literals, then the `L` can appear after the close quote of the first string. Similarly with the `u` and `U` and `u8` prefixes too. The `+` and `-` operators are relevant; the relational operators are syntactically legitimate but of dubious semantic value. There are digraph and trigraph synonyms for `{}[]` in theory. – Jonathan Leffler May 18 '16 at 15:04
  • @MSalters: I suppose you could have `#define M(x) "prefix" # x` too (and there's a digraph and trigraph for `#`). Notable omissions from these possibilities include `(` and 'identifier' — I don't think they're ever valid. – Jonathan Leffler May 18 '16 at 15:06
  • @JonathanLeffler: Well, if you consider macro's, then `"prefix" FOO` can be legal depending on what it will be replaced with. But of course, in the phases of translation where macros are not yet replaced, you don't have identifiers yet. – MSalters May 18 '16 at 15:10
  • @MSalters: Ugh...yes. It's a delicate proposition saying "it can't be done" in C. – Jonathan Leffler May 18 '16 at 15:12
  • @MSalters Is the L not part of the string literal token? – user253751 May 19 '16 at 08:14
  • @immibis: Yes, but I was talking about the situation where you had two adjacent wide string tokens. The second L follows the first string literal. – MSalters May 19 '16 at 08:34
  • @MSalters Is the L not part of the *second* string literal token? – user253751 May 19 '16 at 09:15
  • @immibis : It is, yes. – MSalters May 19 '16 at 09:17
  • 1
    @MSalters In that case, an `L` token is not able to follow a string literal - it is a string literal following another string literal. – user253751 May 19 '16 at 09:18

9 Answers9

137

As per the C11 standard, chapter §5.1.1.2, concatenation of adjacent string literals:

Adjacent string literal tokens are concatenated.

happens in translation phase. On the other hand:

printf("Hi" (test ? "Bye" : "Goodbye"));

involves the conditional operator, which is evaluated at run-time. So, at compile time, during the translation phase, there are no adjacent string literals present, hence the concatenation is not possible. The syntax is invalid and thus reported by your compiler.


To elaborate a bit on the why part, during the preprocessing phase, the adjacent string literals are concatenated and represented as a single string literal (token). The storage is allocated accordingly and the concatenated string literal is considered as a single entity (one string literal).

On the other hand, in case of run-time concatenation, the destination should have enough memory to hold the concatenated string literal otherwise, there will be no way that the expected concatenated output can be accessed. Now, in case of string literals, they are already allocated memory at compile-time and cannot be extended to fit in any more incoming input into or appended to the original content. In other words, there will be no way that the concatenated result can be accessed (presented) as a single string literal. So, this construct in inherently incorrect.

Just FYI, for run-time string (not literals) concatenation, we have the library function strcat() which concatenates two strings. Notice, the description mentions:

char *strcat(char * restrict s1,const char * restrict s2);

The strcat() function appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. [...]

So, we can see, the s1 is a string, not a string literal. However, as the content of s2 is not altered in any way, it can very well be a string literal.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • 1
    you might want to add an extra explanation about `strcat`: the destination array must be long enough to receive the characters from `s2` plus a null terminator after the characters present there already. – chqrlie Apr 30 '19 at 20:00
123

According to the C Standard (5.1.1.2 Translation phases)

1 The precedence among the syntax rules of translation is specified by the following phases.6)

  1. Adjacent string literal tokens are concatenated.

And only after that

  1. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

In this construction

"Hi" (test ? "Bye" : "Goodbye")

there are no adjacent string literal tokens. So this construction is invalid.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 43
    This only repeats the assertion that it's not allowed in C. It does not explain _why_, which was the question. Don't know why it accumulated 26 upvotes in 5 hours.... and the accept, no less! Congratulations. – Lightness Races in Orbit May 16 '16 at 23:03
  • 4
    Have to agree with @LightnessRacesinOrbit here. Why shouldn't `(test ? "Bye" : "Goodbye")` evaulate to either of the string literals essentially _making_ `"Hi" "Bye"` or `"Hi Goodbye"`? (my question is answered in the other answers) – Insane May 17 '16 at 00:30
  • 49
    @LightnessRacesinOrbit, because when people normally ask why something doesn't compile in C, they're asking for clarification on which rule it breaks, not why *Standards Authors of Antiquity* chose it to be that way. – user1717828 May 17 '16 at 00:57
  • 4
    @LightnessRacesinOrbit The question you describe would probably be off topic. I can't see any *technical* reason why it wouldn't be possible to implement that, so without a definitive answer from the authors of the specification, all answers would be opinion based. And it would generally not fall into the category of "practical" or "answerable" questions (as the [help/dont-ask] indicates we require). – jpmc26 May 17 '16 at 07:37
  • 12
    @LightnessRacesinOrbit It's explains *why*: "because C standard said so". Question about why this rule is defined as defined would be off topic. – user11153 May 17 '16 at 09:26
  • 2
    @user11153: An answer "because it is" is 100% useless. See my answer for how this question _should_ be approached (and there's nothing "off topic" about it - what?!) – Lightness Races in Orbit May 17 '16 at 10:14
  • 3
    @LightnessRacesinOrbit *'And an answer "because it is" is 100% useless.'* - many answers on C/C++ question are *"Here is why: "*. They are all useless? This answers could be better, but it is a valid answer. – user11153 May 17 '16 at 10:20
  • 1
    @user11153: It depends on the question.... Come on, this is easy! When someone wants to know how the standard defines something, then quoting the standard obviously makes sense. When someone asks "why does C not allow " then quoting the passage showing that C does not allow is useless. You're not answering the question: you're merely affirming its premise. – Lightness Races in Orbit May 17 '16 at 10:28
  • 2
    @LightnessRacesinOrbit a sufficiently clever compiler (disregarding the standard) could inline the ternary operator in this case, making what's left concatenatable by current rules. In your answer you're only moving the "why" to "why doens't the preprocessor know about the value of `test`". What about `"a" (0 ? "b" : "c")`? Certainly the precompiler would know about `0`? These are the exact questions that require a standards reference. Explaining the motivation might be interesting, but inferring that it _has_ to be this way absent of a standard is generally not true. – thebjorn May 17 '16 at 11:43
  • @thebjorn: That works in _this_ case, but not in the general case. That's why the language rule cannot account for it. Yes I am "only" moving the why to that question because that is the important question that is the entire reason for this behaviour. And the answer is pretty obvious. In general, `test` could come from absolutely anywhere. – Lightness Races in Orbit May 17 '16 at 11:43
  • 1
    @LightnessRacesinOrbit if you look at Lisp macros or immediate words in Forth, there is no reason a sufficiently clever preprocessor/macro-language could not account for the general case. (but that's not C, because C is the language as defined defined by...) – thebjorn May 17 '16 at 11:47
  • 2
    @thebjorn: How could these concatenations be performed at compile-time when "the thing to concatenate" is not known until four years later, two hundred miles away? You would have to pre-store the result of both possibilities (which is an available choice to be sure - not a great one but hey) or do the concat at run-time. The latter probably covers the "you could do this" part. At _this_ point you can say "the standard decided literal concat would take place at compile-time", fine ;) Though there are philosophical reasons you can give for that to complete the story. – Lightness Races in Orbit May 17 '16 at 11:52
  • 2
    @LightnessRacesinOrbit fair enough, although in this case simple constant evaluation at compile time would obviate the need for any runtime support :-) fwiw/imho, Sourav Ghosh is probably the best answer at this point.. – thebjorn May 17 '16 at 11:52
  • @thebjorn: I think so too – Lightness Races in Orbit May 17 '16 at 11:52
  • 1
    @LightnessRacesinOrbit The fact it can't be preconcatenated at compile time does not exclude the possibility that runtime concatenation could be inferred from the resulting type of the ternary without the explicit concatenation symbol. So this code *could* work if the compilers implemented it. The why *is* only because of the specification (and the fact that compilers adhere to the spec). The question says *nothing* specifically about compile time concatenation. – jpmc26 May 17 '16 at 18:28
  • @jpmc26: _"the concatenation could be inferred from the resulting type of the ternary without the explicit concatenation symbol"_ Sorry, I didn't understand this. Could you rephrase? – Lightness Races in Orbit May 17 '16 at 18:32
  • 1
    @LightnessRacesinOrbit The OP sees that two consecutive strings are automatically concatenated: `"Hi" "Bye"`. There's no reason the compiler could not analyze the ternary, see that it results in a string (in this case, it clearly always results in a string), and generate code that concatenates the strings at runtime. In other words, it could act as if there was an implicit `strcat` call on the two strings. So the only "why" is because the language designers chose not to bother. – jpmc26 May 17 '16 at 18:38
  • @jpmc26: Oh, okay, well I already conceded that the concatenation could be done at runtime.... six hours ago! – Lightness Races in Orbit May 17 '16 at 18:42
  • 4
    @LightnessRacesinOrbit: Rather than criticize the answer, which makes a good-faith attempt to answer the question, instead criticize the question. "Why" questions are *vague*. Why does program X exhibit behaviour Y? Because that's what the spec says. Why does the spec say that? Because that's what the designers wrote. Why did the designers write that? Because they had twenty hours of vigorous debate about the pros and cons, and arrived at this as a reasonable compromise. Why this compromise and not another? Blah blah blah, it goes on forever. **Reject the question**. – Eric Lippert May 18 '16 at 20:59
  • @LightnessRacesinOrbit: A better question would be *answerable*. "What section of the specification describes this rule?" is answerable. Even "what considerations motivated this design decision?" is a question that has an answer; the answer might be known by only a few people, but it's a question with an answer. – Eric Lippert May 18 '16 at 21:01
  • @LightnessRacesinOrbit You really do need to understand that the explanation given in this answer is sufficient: the language specification says so. Lexical concatenation precedes expression parsing. **Your own answer* says exactly the same thing. – user207421 May 19 '16 at 02:01
  • 1
    @EJP: "You really do need to understand" is a pretty arrogant thing to say when you really mean "I disagree with your opinion", and when you're objectively wrong. _My own answer_ does **not** say "exactly the same thing". I don't know why I have to keep repeating this, but I'll do so one final time. This answer merely quotes the standard and repeats the question's premise that "this construction is invalid" (truthful, but adds nothing). My answer attempts to explain _why_ this is the case. Now read the question again. "_Why_ does C not allow..." Hope that helps. – Lightness Races in Orbit May 19 '16 at 09:17
41

String literal concatenation is performed by the preprocessor at compile-time. There is no way for this concatenation to be aware of the value of test, which is not known until the program actually executes. Therefore, these string literals cannot be concatenated.

Because the general case is that you wouldn't have a construction like this for values known at compile-time, the C standard was designed to restrict the auto-concatenation feature to the most basic case: when the literals are literally right alongside each other.

But even if it did not word this restriction in that way, or if the restriction were differently-constructed, your example would still be impossible to realise without making the concatenation a runtime process. And, for that, we have the library functions such as strcat.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 3
    I just read assumptions. While what you say is pretty much valid, you can't provide sources for it since there are none. The only source in regards to C is the standard document which (while it is in many cases obvisious) doesn't state why some things are the way they are but just states that they have to be that specific way. So beeing that nit-picky about Vlad from Moscow's answer is inappropiate. Since OP can be breaken down to "Why is it that way?" -Where the only correct sourced answer is "Because it is C, and thats the way C is defined" thats the only literarly straight correct answer. – dhein May 17 '16 at 13:43
  • 1
    This is (admited) lacking of explanation. But here again beeing said is Vlad's answer serving much more as an explanation to the core problem then yours does. Again said: While the information you give I can confirm is related and correct, I disaggree with your complaints. and while I wouldn't consider you'rs offtopic aswell, its from my POV more offtopic then Vlads actually is. – dhein May 17 '16 at 13:43
  • 11
    @Zaibis: The source is me. Vlad's answer is not an explanation at all; it is merely a confirmation of the premise of the question. Certainly neither of them is "off topic" (you might want to look up what that term means). But you are entitled to your opinion. – Lightness Races in Orbit May 17 '16 at 13:48
  • Even after reading above comments, I still wonder who downvoted this answer ᶘ ᵒᴥᵒᶅ I believe this is a perfect answer unless OP asks for further clarifications on this answer. – Mohit Jain May 18 '16 at 08:52
  • 2
    I am unable to distinguish why this answer is acceptable to you and @VladfromMoscow's isn't, when they both say the same thing, and when his is backed by a citation and yours isn't. – user207421 May 19 '16 at 02:03
  • 1
    @EJP: OMG. This _again_. They do **not** say the same thing. His repeats the premise of the question. Mine answers it. Do a diff on the two pieces of text; you'll see that they are _quite_ different. – Lightness Races in Orbit May 19 '16 at 09:21
31

Because C has no string type. String literals are compiled to char arrays, referenced by a char* pointer.

C allows adjacent literals to be combined at compile-time, as in your first example. The C compiler itself has some knowledge about strings. But this information is not present at runtime, and thus concatenation cannot happen.

During the compilation process, your first example is "translated" to:

int main() {
    static const char char_ptr_1[] = {'H', 'i', 'B', 'y', 'e', '\0'};
    printf(char_ptr_1);
}

Note how the two strings are combined to a single static array by the compiler, before the program ever executes.

However, your second example is "translated" to something like this:

int main() {
    static const char char_ptr_1[] = {'H', 'i', '\0'};
    static const char char_ptr_2[] = {'B', 'y', 'e', '\0'};
    static const char char_ptr_3[] = {'G', 'o', 'o', 'd', 'b', 'y', 'e', '\0'};
    int test = 0;
    printf(char_ptr_1 (test ? char_ptr_2 : char_ptr_3));
}

It should be clear why this does not compile. The ternary operator ? is evaluated at runtime, not compile-time, when the "strings" no longer exist as such, but only as simple char arrays, referenced by char* pointers. Unlike adjacent string literals, adjacent char pointers are simply a syntax error.

Unsigned
  • 9,640
  • 4
  • 43
  • 72
  • 2
    Excellent answer, possibly the best here. "It should be clear why this does not compile." You might consider expanding that with "because the ternary operator is a conditional evaluated at *run time* not *compile time*". – cat May 17 '16 at 01:02
  • Shouldn't `static const char *char_ptr_1 = {'H', 'i', 'B', 'y', 'e', '\0'};` be `static const char *char_ptr_1 = "HiBye";` and similarly for the rest of the pointers? – Spikatrix May 17 '16 at 06:05
  • @CoolGuy When you write `static const char *char_ptr_1 = "HiBye";` the compiler translates the line to `static const char *char_ptr_1 = {'H', 'i', 'B', 'y', 'e', '\0'};`, so no, it should not be wrote "like a string". As the Answer says, strings are compiled to an array of chars, and if you were assigning an array of chars in it's most "raw" form, you would use a comma separated list of chars, just like `static const char *char_ptr_1 = {'H', 'i', 'B', 'y', 'e', '\0'};` – Ankush May 17 '16 at 10:42
  • 3
    @Ankush Yes. But although `static const char str[] = {'t', 'e', 's', 't', '\0'};` is the same as `static const char str[] = "test";`, `static const char* ptr = "test";` is _not_ the same as `static const char* ptr = {'t', 'e', 's', 't', '\0'};`. The former is valid and will compile but the latter is invalid and does do what you expect. – Spikatrix May 17 '16 at 13:06
  • I have fleshed out the last paragraph and corrected the code examples, thanks! – Unsigned May 18 '16 at 18:01
14

If you really want to have both branches produce compile-time string constants to be chosen at runtime, you'll need a macro.

#include <stdio.h>
#define ccat(s, t, a, b) ((t)?(s a):(s b))

int
main ( int argc, char **argv){
  printf("%s\n", ccat("hello ", argc > 2 , "y'all", "you"));
  return 0;
}
Eric
  • 1,431
  • 13
  • 14
10

What is the reason for that?

Your code using ternary operator conditionally chooses between two string literals. No matter condition known or unknown, this can't be evaluated at compile time, so it can't compile. Even this statement printf("Hi" (1 ? "Bye" : "Goodbye")); wouldn't compile. The reason is in depth explained in the answers above. Another possibility of making such a statement using ternary operator valid to compile, would also involve a format tag and the result of the ternary operator statement formatted as additional argument to printf. Even then, printf() printout would give an impression of "having concatenated" those strings only at, and as early as runtime.

#include <stdio.h>

int main() {
    int test = 0;
    printf("Hi %s\n", (test ? "Bye" : "Goodbye")); //specify format and print as result
}
user3078414
  • 1,942
  • 2
  • 16
  • 24
  • 3
    SO is not a Tutorial site. You should give an Answer to the OP and not a tutorial. – Michi May 16 '16 at 17:41
  • 1
    This does not answer the OP's question. It may be an attempt to solve the OP's underlying problem, but we don't really know what that is. – Keith Thompson May 16 '16 at 17:47
  • 1
    `printf` doesn't *require* a format specifier; if only the concatenation were done at compile time (which it isn't), OP's use of printf would be valid. – David Conrad May 17 '16 at 07:22
  • Thanks for your remark, @David Conrad. My sloppy wording would indeed make appear as if stating `printf()` would require a format tag, which is absolutely not true. Corrected! – user3078414 May 17 '16 at 08:45
  • That's a better wording. +1 Thanks. – David Conrad May 17 '16 at 09:08
7

In printf("Hi" "Bye"); you have two consecutive arrays of char which the compiler can make into a single array.

In printf("Hi" (test ? "Bye" : "Goodbye")); you have one array followed by a pointer to char (an array converted to a pointer to its first element). The compiler cannot merge an array and a pointer.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
pmg
  • 106,608
  • 13
  • 126
  • 198
0

To answer the question - I would go to the definition of printf. The function printf expects const char* as argument. Any string literal such as "Hi" is a const char*; however an expression such as (test)? "str1" : "str2" is NOT a const char* because the result of such expression is found only at run-time and hence is indeterminate at compile time, a fact which duly causes the compiler to complain. On the other hand - this works perfectly well printf("hi %s", test? "yes":"no")

Stats_Lover
  • 396
  • 4
  • 11
  • *however an expression such as `(test)? "str1" : "str2"` is NOT a `const char*`... Of course it is! It is not a constant expression, but its type **is** `const char *`. It would be perfectly fine to write `printf(test ? "hi " "yes" : "hi " "no")`. The OP's problem has nothing to do with `printf`, `"Hi" (test ? "Bye" : "Goodbye")` is a syntax error no matter what the expression context is. – chqrlie Apr 30 '19 at 19:57
  • Agreed. I confused the output of an expression with the expression itself – Stats_Lover May 01 '19 at 01:24
-4

This does not compile because the parameter list for the printf function is

(const char *format, ...)

and

("Hi" (test ? "Bye" : "Goodbye"))

does not fit the parameter list.

gcc tries to make sense of it by imagining that

(test ? "Bye" : "Goodbye")

is a parameter list, and complains that "Hi" is not a function.

Rodbots
  • 21
  • 4
  • 6
    Welcome to Stack Overflow. You're right that it doesn't match the `printf()` argument list, but that's because the expression isn't valid anywhere — not just in a `printf()` argument list. In other words, you've picked a far too specialized reason for the problem; the general problem is that `"Hi" (` is not valid in C, let alone in a call to `printf()`. I suggest you delete this answer before it is down-voted. – Jonathan Leffler May 17 '16 at 19:28
  • That is not how C works. This is not parsed as trying to call a string literal like PHP. – cat May 17 '16 at 20:24