4

Please note that I have checked the relevant questions to this title, but from my point of view they are not related to this question.

Initially I thought that program1 and program2 would give me the same result.

//Program 1

char *a = "abcd";
char *b = "efgh";
printf("%d", strcmp(a,b));


//Output: -4

//Program 2
printf("%d", strcmp("abcd", "efgh"));

//Output: -1

Only difference that I can spot is that in the program2 I have passed string literal, while in program I've passed char * as the argument of the strcmp() function.

Why there is a difference between the behaviour of these seemingly same program?

Platform: Linux mint compiler: g++

Edit: Actually the program1 always prints the difference of ascii code of the first mismatched characters, but the program2 print -1 if the ascii code of the first mismatched character in string2 is greater than that of string1 and vice versa.

xrfxlp
  • 421
  • 5
  • 15
  • 2
    `strcmp` returns a value that is < 0, 0, or > 0. Apart from 0 the actual value is not specified. – Weather Vane Feb 19 '20 at 17:46
  • 2
    They are both correct. The rest is irrelevant. [but if you *really* want to know: check the assembler output] – wildplasser Feb 19 '20 at 17:47
  • When you have a question about a C library function you should first [check some documentation](https://en.cppreference.com/w/c/string/byte/strcmp). This is just one example, there are a lot of similar sites that would contain the same information. – Blastfurnace Feb 19 '20 at 17:51
  • 1
    Please post your code as [mcve], that is complete compileable minimal code that shows the behaviour you describe. – Jabberwocky Feb 19 '20 at 17:52
  • @AjayMishra Try again at -O1 and they both output `-1`. Like this: https://godbolt.org/z/goJ27E – Artyer Feb 19 '20 at 17:54
  • @Blastfurnace I have checked that, my point was the ambigous behaviour. – xrfxlp Feb 19 '20 at 17:54
  • @Artyer I didn't get you there. – xrfxlp Feb 19 '20 at 17:55
  • 2
    @AjayMishra the bahaviour is not ambiguous. It returns a _negative_ value, and that's what the spec says it should do. – Jabberwocky Feb 19 '20 at 17:55
  • 2
    There's nothing ambiguous about it. The only thing the standard guarantees is the return value will be less than, equal to, or greater than 0. Nobody cares about the exact values for some specific example. They are irrelevant and you can't write code assuming -4 or -1 is more "correct". – Blastfurnace Feb 19 '20 at 17:57
  • @Jabberwocky why different negative values? – xrfxlp Feb 19 '20 at 17:57
  • @AjayMishra if you _really_ want to know, you need to look at the generated assembly output. – Jabberwocky Feb 19 '20 at 17:59
  • 4
    It's not ambiguous, it's uncontroversially < 0. Whoever writes the compiler code won't care, and will return whatever significant value that is easiest. There is no requirement to be consistent. – Weather Vane Feb 19 '20 at 18:02

3 Answers3

6

This is your C code:

int x1()
{
  char *a = "abcd";
  char *b = "efgh";
  printf("%d", strcmp(a,b));
}

int x2()
{
  printf("%d", strcmp("abcd", "efgh"));
}

And this is the generated assembly output for both functions:

.LC0:
        .string "abcd"
.LC1:
        .string "efgh"
.LC2:
        .string "%d"
x1:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], OFFSET FLAT:.LC0
        mov     QWORD PTR [rbp-16], OFFSET FLAT:.LC1
        mov     rdx, QWORD PTR [rbp-16]
        mov     rax, QWORD PTR [rbp-8]
        mov     rsi, rdx
        mov     rdi, rax
        call    strcmp              // the strcmp function is actually called
        mov     esi, eax
        mov     edi, OFFSET FLAT:.LC2
        mov     eax, 0
        call    printf
        nop
        leave
        ret

x2:
        push    rbp
        mov     rbp, rsp
        mov     esi, -1             // strcmp is never called, the compiler
                                    // knows what the result will be and it just
                                    // uses -1
        mov     edi, OFFSET FLAT:.LC2
        mov     eax, 0
        call    printf
        nop
        pop     rbp
        ret

When the compiler sees strcmp("abcd", "efgh") it knows the result beforehand, because it knows that "abcd" comes before "efgh".

But if it sees strcmp(a,b) it does not know and hence generates code that actually calls strcmp.

With another compiler or with different compiler settings things could be different. You really shouldn't care about such details at least at a beginner's level.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • How does knowing before affects the behaviour? – xrfxlp Feb 20 '20 at 08:01
  • 2
    @AjayMishra `strcmp("abcd", "efgh")` will __always__ under any condition return a negative value. The compiler being smart enough to figure this out just replaces the call to `strcmp` by code that returns a negative value, -1 in this case; it could as well have return -2 which is also a negative value. The compiler is not expected to generate code that is a one to one translation of the C code you wrote, but it is expected to generate code that __behaves__ as the C code you wrote. – Jabberwocky Feb 20 '20 at 08:16
  • @Jabberwocky +1 nice answer, adding _The **compiler is not expected** to generate code that is a one to one translation of the C code you wrote, but it is expected to generate code that behaves as the C code you wrote._ from your comment to your answer would make it perfect :-) – some user Feb 29 '20 at 06:24
2

It is indeed surprising that strcmp returns 2 different values for these calls, but it is not incompatible with the C Standard:

strcmp() returns a negative value if the first string is lexicographically before the second string. Both -4 and -1 are negative values.

As pointed by others, the code generated for the different calls is different:

  • the compiler generates a call to the library function in the first program
  • the compiler is able to determine the result of the comparison and generates an explicit result of -1 for the second case where both arguments are string literals.

In order to perform this compile time evaluation, strcmp must be defined in a subtile way in <string.h> so the compiler can determine that the program refers to the C library's implementation and not an alternative that might behave differently. Tracing the corresponding prototype in recent GNU libc include files is a bit difficult with a number of nested macros eventually leading to a hidden prototype.

Note that more recent versions of both gcc and clang will perform the optimisation in both cases as can be tested on Godbolt Compiler Explorer, but neither combines this optmisation with that of printf to generate the even more compact code puts("-1");. They seem to convert printf to puts only for string literal formats without arguments.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
0

I believe (would need to see (and interpret) machine code) one version works without calling code in the library (as if you wrote printf("%d", -1);).

pmg
  • 106,608
  • 13
  • 126
  • 198