7

Why do GCC and Clang produce different output with this conforming C code:

int (puts) (); int (main) (main, puts) int main;
char *puts[(&puts) (&main["\0April 1"])]; <%%>

Neither compiler produces any warning or error even with -Wall -std=c18 -pedantic, but the program produces no output when built with GCC but prints the current date when built with Clang.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 3
    Use `-Wstrict-prototypes -Werror` too — that'll put an end to the nonsense. – Jonathan Leffler Apr 01 '22 at 14:05
  • 4
    @JonathanLeffler: Given today's date, I think the nonsense is the whole point :) – Nate Eldredge Apr 01 '22 at 14:10
  • Are we sure this is not a GCC bug? – HolyBlackCat Apr 01 '22 at 14:10
  • 4
    If anyone wants their fun spoiled, [here](https://godbolt.org/z/qPrdohTcr) is a less obfuscated example. – Nate Eldredge Apr 01 '22 at 14:17
  • 1
    @NateEldredge [The fun just begins](https://stackoverflow.com/questions/71708132/why-do-gcc-and-clang-produce-different-output-with-variable-length-array/71708308?noredirect=1#comment126728973_71708308). – HolyBlackCat Apr 01 '22 at 14:20
  • 1
    Kind of a dupe: [Referencing a yet-to-be-mentioned function parameter using the new-style function declarations](https://stackoverflow.com/questions/55388209/referencing-a-yet-to-be-mentioned-function-parameter-using-the-new-style-functio) – Lundin Apr 01 '22 at 14:31
  • Also of interest is https://godbolt.org/z/z98bb38xa, in which both compilers again print the message. I guess the issue is that for a parameter declared as `int arg[expr]`, the value of `expr` is irrelevant to code generation, since `arg` just decays to `int *`, and so gcc can go on without evaluating it (I'm not claiming it is correct to do so). But with `int (*arg)[expr]` there is no choice; we have to know the value of `expr` to do correct pointer arithmetic, so it must be evaluated. – Nate Eldredge Apr 01 '22 at 15:06
  • 2
    In which respect https://godbolt.org/z/nd4oxro74, with `int arg[foo()][foo()];`, is similarly fun: gcc prints `Hello` once and clang prints it twice. I'm leaning toward gcc being wrong but careful parsing of the standard might be needed, if it has a clear answer at all. – Nate Eldredge Apr 01 '22 at 15:09

2 Answers2

11

Why do GCC and Clang produce different output with this conforming C code:

int (puts) (); int (main) (main, puts) int main;
char *puts[(&puts) (&main["\0April 1"])]; <%%>

In the first place, it is conforming code, though it does make use of a variable-length array, which is an optional language feature in C11 and C17. Some of the obfuscations are

  • use of the obscure digraphs <% and %>, which mean the same thing as { and }, respectively.
  • parenthesizing the function identifiers in function declarations
  • a forward declaration of function puts that is not a prototype
  • a K&R-style definition of function main
    • with a VLA parameter
      • whose dimension expression contains a function call
      • and a reference to another parameter
  • use of unconventional identifiers for the parameters to function main()
  • use of identifiers (puts and main) in declarations of an object and a function, respectively, with the same identifier
  • use of the identifier main for something more than the program's entry-point function
  • inversion of the conventional order of the operands of the indexing operator ([])
    • plus, indexing a sting literal
  • calling a function via an explicit function pointer constant expression
  • A string literal with an explicit null character within
  • Unconventional placement (and omission) of line breaks

A less obfuscated equivalent would be

int puts();

int main(
    int argc,
    char *argv[ puts("\0April 1" + argc) ]
) {
}

But the central question about the difference in behavior between the version compiled with GCC and the one built with Clang comes down to whether the expression for the size of the VLA function parameter is evaluated at runtime.

The language spec says that when a function parameter is declared with array type, its type is "adjusted" to the corresponding pointer type. That applies equally to complete, incomplete, and variable-length array types, but the spec does not explicitly say that the expression(s) for the dimension(s) are not evaluated. It does specify that expressions go unevaluated in certain other cases, and it even makes an exception to such a rule in the case of sizeof expressions involving VLAs, so the omission in this case could be interpreted as meaningful.

That makes a difference only for parameters of VLA type, because only for those can evaluation of the dimension expression(s) produce side effects on the machine state, including, but not limited to, observable program behavior.

GCC does not evaluate the VLA parameter's size expression at runtime, and I am inclined to take this as conforming to the intent of the standard. As a result, the GCC-compiled program does nothing but exit with status 0.

Clang does evaluate the VLA parameter's size expression at runtime. Although I disfavor this interpretation of the spec, I cannot rule it out. When it does evaluate the size expression, it uses the passed value of the first parameter. When the program is run without arguments, then the first parameter has value 1, with the result that the standard library's puts function is called with a pointer to the 'A' in "\0April 1".

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 2
    Interestingly, GCC also evaluates the size if the modern-style parameter declarations are used. – HolyBlackCat Apr 01 '22 at 16:56
  • Interesting indeed, @HolyBlackCat. Although I am prepared to accept either interpretation of whether the dimension expression is evaluated, I think GCC is taking unwarranted liberties by applying a different interpretation for one parameter-declaration style than for the other. – John Bollinger Apr 01 '22 at 18:24
  • In the case of an argument of type pointer to variable-length array, `int (*arrayptr)[foo()]`, the function `foo()` must be called because the code needs to know the size of `*arrayptr` to do correct pointer arithmetic. So your interpretation seems to have the counterintuitive consequence that in `void blah(int twodim[foo()][bar()]) { }`, we'd have that `bar()` is called and `foo()` is not. – Nate Eldredge Apr 02 '22 at 23:33
  • @NateEldredge, yes, my interpretation would have that consequence. I don't find it especially counterintuitive. At least, not more so than function calls in a function's parameter list in general. – John Bollinger Apr 03 '22 at 12:32
  • I congratulate you for the amount of time and effort spent to decipher this code salad. (: – Edenia Jun 07 '23 at 18:47
0
int (puts) ();
int (main) (main, puts)
    int main;
    char *puts[(&puts) (&main["\0April 1"])];
{
}

Somebody's got a compiler bug; I'm just not sure who anymore. I don't understand why any compiler would emit code to evaluate the size parameter of a VLA as an argument.

The clang output is rather bizarre. For it to work, it would have had to find main in the function's scope but puts in the global scope despite having already encountered the declaration for puts. Normally, you can access a variable in its own declaration.

If somebody did this in production code my answer would be rather: "Stop using K&R function definitions."

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 2
    If you rewrite to use the modern parameters, GCC also prints the string: https://godbolt.org/z/67TYqWo9o – HolyBlackCat Apr 01 '22 at 14:15
  • @HolyBlackCat: Well that's a double-take. – Joshua Apr 01 '22 at 14:17
  • 3
    RE “it would have had to find `main` in the function's scope but `puts` in the global scope”: This aspect of the code is fully defined in the C standard. Per C 2018 6.2.1 7, the scope of an identifier other than a tag or enumeration constant begins after its declarator. And the declarator includes array brackets and function call parentheses. So you can use an identifier in its initializer(s), but, as long as you are still in the declarator, the identifier can refer to something in the enclosing scope. – Eric Postpischil Apr 01 '22 at 14:49