13

This code is from Hacker's Delight. It says this is the shortest such program in C and is 64 characters in length, but I don't understand it:

    main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

I tried to compile it. It compiles with 3 warnings and no error.

nobody
  • 19,814
  • 17
  • 56
  • 77
PDP
  • 143
  • 9
  • 8
    3 warnings and no error means successful compilation, why don't you run it? – Cestarian Apr 24 '15 at 01:30
  • 2
    @Cestarian The question isn't *what* does it do - it's *how* does it do it? Hence, the title. – Barry Apr 24 '15 at 01:35
  • 4
    This isn't actually the shortest program. The actual shortest is 0 bytes long. You can get the compiler to successfully compile a 0 byte c file into an executable. Running that exe results in 0 bytes being printed, which is the entire source code for the original program. – Grant Peters Apr 24 '15 at 01:43
  • 3
    @GrantPeters: You can? How? An empty source file is a valid translation unit, but not a valid source file. – Keith Thompson Apr 24 '15 at 02:03
  • @KeithThompson see http://stackoverflow.com/questions/17515790/does-compiling-an-empty-file-follow-the-c-standard for the ioccc entry that used this – Grant Peters Apr 24 '15 at 06:43
  • @GrantPeters: Apparently the empty source file isn't even used. – Keith Thompson Apr 24 '15 at 06:46
  • @KeithThompson It is, look at the make file for it. It's passed in to the compiler. To be fair, the entry did win for abusing the rules of the competition. – Grant Peters Apr 24 '15 at 06:48

4 Answers4

7

This program relies upon the assumptions that

  • return type of main is int
  • function's parameter type is int by default and
  • the argument a="main(a){printf(a,34,a=%c%s%c,34);}" will be evaluated first.

It will invoke undefined behavior. Order of evaluation of arguments of a function is not guaranteed in C.
Albeit, this program works as follows:

The assignment expression a="main(a){printf(a,34,a=%c%s%c,34);}" will assign the string "main(a){printf(a,34,a=%c%s%c,34);}" to a and the value of the assignment expression would be "main(a){printf(a,34,a=%c%s%c,34);}" too as per C standard --C11: 6.5.16

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment [...]

Taking in mind the above semantic of assignment operator the program will be expanded as

 main(a){
      printf("main(a){printf(a,34,a=%c%s%c,34);}",34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);
}  

ASCII 34 is ". Specifiers and its corresponding arguments:

%c ---> 34 
%s ---> "main(a){printf(a,34,a=%c%s%c,34);}" 
%c ---> 34  

A better version would be

main(a){a="main(a){a=%c%s%c;printf(a,34,a,34);}";printf(a,34,a,34);}  

It is 4 character longer but at least follows K&R C.

haccks
  • 104,019
  • 25
  • 176
  • 264
5

It relies on several quirks of the C language and (what I think is) undefined behavior.

First, it defines the main function. It is legal to declare a function without a return type or parameter types, and they will be presumed to be int. This is why the main(a){ part works.

Then, it calls printf with 4 parameters. Since it has no prototype, it is assumed to return int and accept int parameters (unless your compiler implicitly declares it otherwise, like Clang does).

The first parameter is presumed int and is argc at the beginning of the program. The second parameter is 34 (which is ASCII for the double-quote character). The third parameter is an assignment expression that assigns the format string to a and returns it. It relies on a pointer-to-int conversion, which is legal in C. The last parameter is another quote character in numeric form.

At runtime, the %c format specifiers are substituted with quotes, the %s is substituted with the format string, and you get the original source again.

As far as I know, the order of argument evaluation is undefined. This quine works because the assignment a="main(a){printf(a,34,a=%c%s%c,34);}" is evaluated before a is passed as the first parameter to printf, but as far as I know, there is no rule to enforce it. Additionally, this can't work on 64-bit platforms because the pointer-to-int conversion will truncate the pointer to a 32-bit value. As a matter of fact, even though I can see how it works on some platforms, it doesn't work on my computer with my compiler.

zneak
  • 134,922
  • 42
  • 253
  • 328
  • Yes, there's UB due to order of evaluation. There's also more UB because the type of `a` (`int`) differs from the type expected by the `%s` conversion (`char *`). – Jerry Coffin Apr 24 '15 at 01:58
  • @zneak thanks now i understand this program and you are really good in c – PDP Apr 24 '15 at 18:03
  • @JerryCoffin; What is "more" or "less" UB? – haccks Apr 24 '15 at 18:16
  • 1
    @haccks: Another instance of undefined behavior, so even if (for example) the UB cited in the answer were somehow fixed, the program as a whole would still have UB. IOW, it is not "behavior that is (more undefined), but "(more behavior) that is undefined". – Jerry Coffin Apr 24 '15 at 18:25
4

This works based on lots of quirks that C allows you to do, and some undefined behavior that happens to work in your favor. In order:

main(a) { ...

Types are assumed to be int if unspecified, so this is equivalent to:

int main(int a) { ...

Even though main is supposed to take either 0 or 2 arguments, and this is undefined behavior, this can be allowed as just ignoring the missing second argument.

Next, the body, which I will space out. Note that a is an int as per main:

printf(a,
       34,
       a = "main(a){printf(a,34,a=%c%s%c,34);}",
       34);

The order of evaluation of arguments is undefined, but we're relying on the 3rd argument - the assignment - getting evaluated first. We're also relying on the undefined behavior of being able to assign a char * to an int. Also, note that 34 is the ASCII value of ". Thus, the intended impact of the program is:

int main(int a, char** ) {
    printf("main(a){printf(a,34,a=%c%s%c,34);}",
           '"',
           "main(a){printf(a,34,a=%c%s%c,34);}",
           '"');
    return 0; // also left off
}

Which, when evaluated, produces:

main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

which was the original program. Tada!

Barry
  • 286,269
  • 29
  • 621
  • 977
2

The program is supposed to print its own code. Note the similarity of the string literal to the overall program code. The idea is that the literal will be used as the printf() format string because its value is assigned to variable a (albeit in the argument list) and that it will also be passed as the string to print (because an assignment expression evaluates to the value that was assigned). The 34 is the ASCII code for the double quote character ("); using it avoids a format string containing escaped literal quotation mark characters.

The code relies on unspecified behavior in the form of the order of evaluation of the function arguments. If they are evaluated in argument list order then the program is likely to fail because the value of a would then be used as a pointer to the format string before the correct value was actually assigned to it.

Additionally, the type of a defaults to int, and there is no guarantee that int is wide enough to hold an object pointer without truncating it.

Furthermore, the C standard specifies only two permitted signatures for main(), and the signature used is not among them.

Moreover, the type of printf() inferred by the compiler in the absence of a prototype is incorrect. It is by no means guaranteed that the compiler will generate a calling sequence that works for it.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157