2

I wrote the following code in C.

#include <stdio.h>
#include <string.h>
int main(void) {
    char str1[4] = "abcd";
    char str2[4] = "abcd";
    printf("%d\n",strcmp(str1,str2));

    return 0;
}

I expected the return value to be 0 (as I am taught that strcmp function returns 0 for equal strings). But it prints 1!

Success  time: 0 memory: 2248 signal:0
1

Is it a bug? Or am I missing out on something?

4 Answers4

10

Because your arrays are not long enough. Your are not taking into account the zero terminator of your strings. You need 5 chars for your string, four for the string itself plus one for the zero terminator.

Write:

char str1[5] = "abcd";
char str2[5] = "abcd";

BTW I wonder why your compiler does not issue a warning or does it ?

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • Is the indexing not supposed to start from 0? What I mean is 0 => a, 1 => b, 2 => c, 3 => d, 4 => <> Is it not the case? – shivamtiwari93 Mar 18 '14 at 12:41
  • 1
    It does. An array of length 4 has indices from 0 to 3. You need space for the trailing zero, thus your arrays need to be 5 long. – JasonD Mar 18 '14 at 12:41
  • @shivamtiwari93 `char str1[4]` creates an array of **length** 4, not **highest index** 4. As @JasonD said, that means your indices are 0-3, not 0-4. – cf- Mar 18 '14 at 12:44
  • I believe 0 to 3 makes 4 array-elements - 0th,1st,2nd,3rd and the 4th one is utilized to mark the end of array. Am I wrong? – shivamtiwari93 Mar 18 '14 at 12:44
  • Then why it does not throw array index out of bound exception? – Butani Vijay Mar 18 '14 at 12:44
  • 2
    Or you could write `char str1[] = "abcd";` and let the compiler calculate the length (5). – user694733 Mar 18 '14 at 12:45
  • i am writing char str1[4] = "abcd"; then what compiler do for last character? – Butani Vijay Mar 18 '14 at 12:47
  • 2
    @ButaniVijay: there is no such thing as "out of bound" exception in C. If you write further than the last index, the memory after the array will get overwritten and anything can happen then. – Jabberwocky Mar 18 '14 at 12:49
  • 1
    @shivamtiwari93 The terminating `\0` [isn't taken into account for `strlen`](http://www.cplusplus.com/reference/cstring/strlen/). – cf- Mar 18 '14 at 12:52
  • If you print the strings in the code in the question, you'll probably see that they're not what you expected... – JasonD Mar 18 '14 at 12:53
  • 1
    (most likely it's copied only the 4 chars into the array and not bothered with the terminator, and you're comparing two unterminated, and quite possibly overlapping, strings) – JasonD Mar 18 '14 at 12:56
  • @JasonD `char str[4] = "abcd";` is not in itself undefined behavior. This is perfectly valid shorthand syntax (in **C**) to initialize a 4-character array to `{ 'a', 'b', 'c' , 'd' }`. – Pascal Cuoq Mar 18 '14 at 13:10
  • 1
    @PascalCuoq Yes, that was badly worded. The problem is comparing unterminated strings as a result of the initialisation not doing what was expected. – JasonD Mar 18 '14 at 13:13
3

(Courtesy Pascal Cuoq)

The C99 standard §6.7.8.¶14 says

An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

Since strings are terminated by the null byte '\0', the actual number of characters in the string literal "abcd" is 5. The size of the arrays str1 and str2 is 4. Therefore, they cannot hold the null byte and are, in fact, not strings. The below statements are equivalent.

char str1[4] = "abcd";
char str1[4] = {"abcd"}
char str1[4] = {'a', 'b', 'c', 'd'};

Passing str1 and str2 to strcmp invokes undefined behaviour because they are not strings. strcmp will overrun the buffer pointed to by str1 and str2 since it cannot find the terminating null byte. This is undefined behaviour and may result in program crash due to segfault.

A string is a character array terminated by the null byte '\0'. Therefore, the length of the string literal "abcd" is 5 and not 4. Note that the standard library function strlen does not count the null byte so in this case, strlen("abcd") returns 4.

When you initialize an array with a string literal, it is a good practice to leave your array size blank which is automatically determined to be large enough to store all the characters in the string literal it is initialized with.

#include <stdio.h>
#include <string.h>

int main(void) {
    char str1[] = "abcd";
    char str2[] = "abcd";
    printf("%d\n", strcmp(str1, str2));  // prints 0

    return 0;
}
ajay
  • 9,402
  • 8
  • 44
  • 71
  • 2
    “undefined behaviour because the character arrays str1 and str2 are not large enough to hold the string literal they are initialized with” No, wrong. The standard says: “An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.” C99 6.7.8:14. Passing `str1` to `strcmp` is undefined behavior, but `char str1[4] = "abcd";` isn't. – Pascal Cuoq Mar 18 '14 at 13:17
  • @PascalCuoq Thank you. Learned a new thing. I have updated my answer. – ajay Mar 18 '14 at 13:23
2

One of your strings looks like this in memory: 'a' 'b' 'c' 'd' '\0'. That is, like every string it is terminated by '\0' which is a character just like 'a' or 'b'. Therefore you need room for five chars in order to store the string "abcd" and must declare it by char str1[5].

chessweb
  • 4,613
  • 5
  • 27
  • 32
  • @ButaniVijay: Because the C-Compiler doesn't check for violations of array bounds. It is the sole responsibility of the programmer to provide arrays that are sufficiently large. – chessweb Mar 18 '14 at 12:51
  • @ButaniVijay it is `C`, not `java` – Dabo Mar 18 '14 at 12:51
2

Either you give correct size for your arrays, either you let the compiler to do everything for you

 char str1[] = "abcd";
 char str2[] = "abcd";

in this case, compiler will give enough space for your strings.

Dabo
  • 2,371
  • 2
  • 18
  • 26