3

I want to know why strcmp() returns different values if used more than once in the same function. Below is the program. The first case I am aware of why it prints -6. But in the second case, why does it print -1?

#include<stdio.h>
#include<string.h>
int main()
{
    char a[10] = "aa";
    char b[10] = "ag";
    printf("%d\n",strcmp(a, b));
    printf("%d\n",strcmp("aa","ag"));
    return 0;
}

And the output it produces is below

[sxxxx@bhlingxxx test]$ gcc -Wall t51.c
[sxxxx@bhlingxxx test]$ ./a.out
    -6
    -1

Why is the output of second strcmp() -1? Is it the Compiler who plays here? If so What is the exact optimization it does?

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85
  • 2
    Take a look at the generated machine code to see what it really does, and how the compiler translated your source code. – Some programmer dude Feb 15 '19 at 14:03
  • 4
    You shouldn't care about the exact value it returns (unless it's 0); only if it's less than, equal to, or greater than 0. – Shawn Feb 15 '19 at 14:06
  • Possibly the compiler when comparing two constant strings knows the answer is negative and returns -1. – Rishikesh Raje Feb 15 '19 at 14:06
  • 1
    If you assign those literals to pointers (and NOT char arrays), the result will be the same as in the case of char arrays, i.e. `char* c="aa", *d = "ag";`, will produce the same `-6`. – Duck Dodgers Feb 15 '19 at 14:08
  • On Clang, the OP's code gives an output `-6 -6`. Only on GCC does it return `-6 -1`. – Duck Dodgers Feb 15 '19 at 14:11
  • 1
    @Shawn: The fact that the C standard only specifies the sign of the result does not mean you should not care about the magnitude. Obeying the C standard is a small part of software engineering, and curiosity is a great motivation that aids in learning how compilers work, developing deeper understanding of semantics, and more. It is good and valuable to care and to inquire, as the knowledge gained can lead to writing code that optimizes better, is less likely to contain errors, and more. Knowledge is valuable, and people should care. – Eric Postpischil Feb 15 '19 at 14:16
  • @bruno, I have no idea, why `strcmp()`, when passed literal strings as parameters returns `-1`, only for the gcc compiler. In all other cases, `char*` or char array or using clang, it returns -6 ,always. I can only say it seems implementation-defined. I was merely testing and stating the obvious. `:)`. – Duck Dodgers Feb 15 '19 at 14:19
  • 1
    @JoeyMallone I read bad, in fact is just one of the strcmp is computed during the compilation and the other during the execution, only the sign / 0 is relevant, not the value by itself (-1 or -6) – bruno Feb 15 '19 at 14:24
  • @bruno, thanks. You had me a little confused there `:D`. I was going over my comments again and again. Yes, seems to be that the only diff is runtime vs compile-time checking. I guess, since the compiler knows already, it stops earlier during compile-time comparison. – Duck Dodgers Feb 15 '19 at 14:26
  • @JoeyMallone I get cold and my poor brain doesn't receive enough oxygen, sorry ^^ – bruno Feb 15 '19 at 14:28
  • @bruno, There there. Summer is coming soon. Only a couple of weeks more. `:)` – Duck Dodgers Feb 15 '19 at 14:28
  • 2
    @EricPostpischil there is however a danger that being curious about exactly how your system works can lead to you exploring and taking advantage of little details that are in fact specific to your system alone, and which don't produce the same results on other systems. The important thing, I think, is what the standard says ought to happen. As long as you know that, you can explore how different systems implement it to your heart's content, but if you don't, you're potentially going to end up on the wrong side of undefined behavior. – Tim Randall Feb 15 '19 at 14:29

3 Answers3

6

The C standard says the following regarding the return value of strcmp:

Section 7.24.4.2p3:

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2

So as long as the result fits that description it is compliant with the C standard. That means the compiler can perform optimizations to fit that definition.

If we look at the assembly code:

.loc 1 7 0
leaq    -32(%rbp), %rdx
leaq    -48(%rbp), %rax
movq    %rdx, %rsi
movq    %rax, %rdi
call    strcmp
movl    %eax, %esi
movl    $.LC0, %edi
movl    $0, %eax
call    printf
.loc 1 8 0
movl    $-1, %esi      # result of strcmp is precomputed!
movl    $.LC0, %edi
movl    $0, %eax
call    printf

In the first case, arrays are passed to strcmp to a call to strcmp and a call to printf are generated. In the second case however, string constants are passed to both. The compiler sees this and generates the result itself, optimizing out the actual call to strcmp, and passes the hardcoded value -1 to printf.

dbush
  • 205,898
  • 23
  • 218
  • 273
5

from https://linux.die.net/man/3/strcmp

The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.

The strcmp function only promises to return negative value for the comparison given above. The actual value to be returned is not specified.

What has probably happened is that for strcmp("aa","ag") the compiler knows the result is negative and optimises it to -1

Rishikesh Raje
  • 8,556
  • 2
  • 16
  • 31
1

The only thing that the C standard guarantees for strcmp is that the sign of the return value will indicate the direction of the inequality if there is one, or zero if the strings are exactly equal.

While returning the difference between the numeric values of the chars at the first place they differ is a fairly common implementation, it's not required. If the compiler can look at string constants and know right away what the result of strcmp will be, it may add a flat -1, 1, or 0 in its place rather than go through the effort of actually calling the function.

The solution to this is to not write code that relies on a particular implementation of strcmp, no matter how common it may be. Only trust the sign of the return value.

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85