I do not understand strcmp results

Question

this is my implementation of strcmp ,

   #include <stdio.h>
   #include <string.h>

   int ft_strcmp(const char *s1, const char *s2)
   {
       while (*s1 == *s2)
       {
           if (*s1 == '\0')
              return (0);
          s1++;
          s2++;
      }
      return (*s1 - *s2);
  }

  int main()
  {
      char    s1[100] = "bon";
      char    s2[100] = "BONN";
      char    str1[100] = "bon";
      char    str2[100] = "n";
      printf("%d\n", ft_strcmp(s1, s2));
      printf("%d\n", ft_strcmp(str1, str2));
      return (0);
  }

from the book kernighan and Ritchie but i use a while loop, instead of the for, i ve tested it many times and my strcmp geaves the same results as the original strcmp, but i do not understand the results , i rode the man: "The strcmp() and strncmp() functions lexicographically compare the null-terminated strings s1 and s2." what does lexicography means ? "return an integer greater than, equal to, or less than 0, according as the string s1 is greater than, equal to, or less than the string s2." i understand this part but my questions are how can it come up with such results:

32
-12

s1 looks < s2 for me so how and why do i get 32 and how the calcul is made ? str1 looks > str2 for me, how and why do i get -12 and how the calcul is made. I ve compile it with the real STRCMP and i get the Same results..

last question why do i need to compare *s1 to '\0' won't it work fine without ?

thank you for your answers i m confused..

This isn't quite equivalent to the standard `strcmp` function. It can fail if either string contains characters with negative values. This can happen only if plain `char` is signed, which it commonly is. Quoting the standard: "The sign of a nonzero value returned by the comparison functions `memcmp`, `strcmp`, and `strncmp` is determined by the sign of the difference between the values of the first pair of characters (both interpreted as `unsigned char`) that differ in the objects being compared." — Keith Thompson, Mar 15 '15 at 14:48
A couple of answers mention ASCII. That's a character set with one encoding. A character set maps a character to a number. An encoding maps the number to byte(s). You're probably not using ASCII (nor ever will). Windows-1252 (and similar) and Unicode/UTF-8 are much more common. It's important to know which character set and encoding you are using. The character number would determine the lexicographic ordering. The algorithm must deal with the encoding. Lexicographic ordering is the simplest. In general, ordering is specified by a collation, which can be associated with a locale or culture. — Tom Blodget, Mar 15 '15 at 15:34

score 3 · Accepted Answer · answered Mar 15 '15 at 14:27

3

1) K&R are comparing the ascii values of those chars, that's why you get 32 and -12, check out an ascii table and you'll understand.

2)If you don't check for \0 , how can you know when the string end? That's the c strings terminator.

answered Mar 15 '15 at 14:27

uraimo

19,081
8
48
55

i looked the ascii table i 've try to do the calcul using octal or decimal but i don't find the same results – user3540997 Mar 15 '15 at 14:30
1

Look at the decimal values, your first couple of strings is different since the first character: (b=98) - (B=66) =32 – uraimo Mar 15 '15 at 14:33
Well my bad , i was adding all the letter , but only the first is pointed that's why i couldn't find the same results.. 98 - 110 = -12 and 98 - 66 = 32... – user3540997 Mar 15 '15 at 14:35
1

The explicit check for `'\0'` only applies when the two strings are identical (and returns `0` to indicate that). If they're not identical, the function compares characters at the first position where they're unequal. If one string is an initial substring of the other (`"abc"` vs. `"abcde"` or vice versa), it compares the terminating `'\0'` of one string to the corresponding non-null character of the other. – Keith Thompson Mar 15 '15 at 14:44

score 1 · Answer 2 · answered Mar 15 '15 at 14:28

1

Capital letters in terms of ASCII codes actually precede lowercase letters, as you can see here.

So in terms of lexicographic ordering, s1 is treated as being bigger than s2, because the ascii value of the first letter that differs is the larger one.

answered Mar 15 '15 at 14:28

VHarisop

2,816
1
14
28

score 0 · Answer 3 · answered Mar 15 '15 at 14:41

0

SO we compare *s1 to '\0' to see when does the string ends, and the results are made using the decimal value of the first characteres of each string.

answered Mar 15 '15 at 14:41

user3540997

49
1
1
9

score 0 · Answer 4 · edited Apr 25 '18 at 06:01

0

int ft_strcmp(char *s1,char *s2)
{
    int x;

    x = 0;
    while(s1[x] != '\0' && s2[x] != '\0' && s1[x] == s2[x])
        i++;
    return (s1[x] - s2[x]);
}

by mokgohloa ally

edited Apr 25 '18 at 06:01

Vishal Chhodwani

2,567
5
27
40

answered Apr 25 '18 at 05:55

alfred mokgohloa

1
1

I do not understand strcmp results

4 Answers4