4

I'm working to try and understand some string functions so I can more effectively use them in later coding projects, so I set up the simple program below:

#include <stdio.h>
#include <string.h>

int main (void)
{
// Declare variables:
char test_string[5];
char test_string2[] = { 'G', 'O', '_', 'T', 'E', 'S', 'T'};
int init; 
int length = 0;
int match;

// Initialize array:
for (init = 0; init < strlen(test_string); init++)
{    test_string[init] = '\0';
}

// Fill array:
test_string[0] = 'T';
test_string[1] = 'E';
test_string[2] = 'S';
test_string[3] = 'T';

// Get Length:
length = strlen(test_string);

// Get number of characters from string 1 in string 2:
match = strspn(test_string, test_string2);

printf("\nstrlen return = %d", length);
printf("\nstrspn return = %d\n\n", match);

return 0;
}

I expect to see a return of:

strlen return = 4 strspn return = 4

However, I see strlen return = 6 and strspn return = 4. From what I understand, char test_string[5] should allocate 5 bytes of memory and place hex 00 into the fifth byte. The for loop (which should not even be nessecary) should then set all the bytes of memory for test_string to hex 00. Then, the immediately proceeding lines should fill test_string bytes 1 through 4 (or test_string[0] through test_string[3]) with what I have specified. Calling strlen at this point should return a 4, because it should start at the address of string 0 and count an increment until it hits the first null character, which is at string[4]. Yet strlen returns 6. Can anyone explain this? Thanks!

Ryan Barker
  • 113
  • 2
  • 11
  • Welcome to Stack Overflow. Please read the [About] page. Note that the claim in your question is implausible; `strlen()` is both very simple and very widely tested. If it was wrong, it would be a well-known problem (that would be fixed very quickly). Therefore, your problem is that `strlen()` is not giving you the result you expect, but it is more likely that your expectations are wrong than that `strlen()` is wrong. – Jonathan Leffler Feb 28 '14 at 02:15
  • Hi Jonathan, I suppose that was a bad title for this question. I definitely know that strlen is written correctly and nothing is wrong with it, and that my problem is due to something I'm doing, I just cannot figure out what it is for the life of me. I took a quick scan of the about page before joining, but I'll go give it a full read for you in a minute. Thanks! – Ryan Barker Feb 28 '14 at 02:23
  • Take a look at the upper bounds of your init loop. I don't see any code here at all that would put a NULL byte into your string. (Hint: allocating a variable without an initializer does not guarantee anything about the contents of memory there. ) – BRPocock Feb 28 '14 at 02:38
  • BRPocock, thanks. I realized your hint talking to the others who gave answers, and figured out that the problem was indeed in my initialization loop. As many pointed out, the best idea would have been: char test_string[] = "TEST"; and char test_string2[] = "GO_TEST"; - I do not know why I didn't think to use these, as that trick is one I have pulled many, many times in the past. Apparently, in my around 1.5 years of coding, I still haven't learned that the compiler is not my friend haha. C is a very efficient language, I just have to make sure I know how it ticks before making assumptions. – Ryan Barker Feb 28 '14 at 02:43

3 Answers3

7
char test_string[5];

test_string is an array of 5 uninitialized char objects.

for (init = 0; init < strlen(test_string); init++)

Kaboom. strlen scans for the first '\0' null character. Since the contents of test_string are garbage, the behavior is undefined. It might return a small value if there happens to be a null character, or a large value or program crash if there don't happen to be any zero bytes in test_string.

Even if that weren't the case, evaluating strlen() in the header of a for loop is inefficient. Each strlen() call has to re-scan the entire string (assuming you've given it a valid string), so if your loop worked it would be O(N2).

If you want test_string to contain just zero bytes, you can initialize it that way:

char test_string[5] = "";

or, since you initialize the first 4 bytes later:

char test_string[5] = "TEST";

or just:

char test_string[] = "TEST";

(The latter lets the compiler figure out that it needs 5 bytes.)

Going back to your declarations:

char test_string2[] = { 'G', 'O', '_', 'T', 'E', 'S', 'T'};

This causes test_string2 to be 7 bytes long, without a trailing '\0' character. That means that passing test_string2 to any function that expects a pointer to a string will cause undefined behavior. You probably want something like:

char test_string2[] = "GO_TEST";
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 1
    Thank you for the response! As I had stated for the other two answers, I didn't realize that a) declarations with just a name and size do not place a null character into a string and b) I was calling strlen for the loop without first initializing all of the data in test_string (I mean looking at the code, I could tell you that, I just totally didn't even think about it when I wrote it). Also, thank you for pointing out the issues with the test_string2 init statement. I had thought placing the [] would tell the compiler to put the null character at the end, but seems like this is not the case – Ryan Barker Feb 28 '14 at 02:37
  • 2
    @RyanBarker: A "string" in C is a data layout, not a data type; it's defined as "a contiguous sequence of characters terminated by and including the first null character". An array of `char` may or may not contain a string, and the compiler isn't going to assume that it will hold a string. String literals are (almost always) null-terminated; character arrays are not. – Keith Thompson Feb 28 '14 at 02:39
  • 1
    That is what I was thinking after reading your answer, but thank you so much for confirming it. I really appreciate your help! – Ryan Barker Feb 28 '14 at 02:44
4

strlen searches for '\0' character to count them, in your test_string, there is none so it continues until it finds one which happens to be 6 bytes away from the start of your array since it is uninitialized.

The compiler does not generate code to initialize the array so you don't have to pay to run that code if you fill it later.

To initialize it to 0 and skip the loop, you can use

char test_string[5] = {0};

This way, all character will be initialized to 0 and your strlen will work after you filled the array with "TEST".

Eric Fortin
  • 7,533
  • 2
  • 25
  • 33
  • But there is a null in test_string, specifically at test_string[4]. The for loop I set up is even there as a double check for that, unless I'm just missing something. – Ryan Barker Feb 28 '14 at 02:17
  • Not necessarily, I bet to loop does not iterate as long as you think since there could be anything in test_string upon running. – Eric Fortin Feb 28 '14 at 02:20
  • @RyanBarker Your init loop itself uses strlen which can't be used until you have a null terminator. – TypeIA Feb 28 '14 at 02:20
  • Thank you! I didn't think about that. Changing the condition in the for loop to init < 5 did the trick and now strlen runs correctly. I had thought that the statement test_string[5] set a '\0' at test_string[4], but I suppose that doesn't happen automatically as I assumed. – Ryan Barker Feb 28 '14 at 02:26
2

There are a few problems here. First of all, char test_string[5]; simply sets aside 5 bytes for that string, but does not set the bytes to anything. In particular, when you say "char test_string[5] should allocate 5 bytes of memory and place hex 00 into the fifth byte", the second part is wrong.

Secondly, your array initialization loop uses strlen(test_string) but since the bytes of test_string are uninitialized, there's no way to know what's there so strlen(test_string) returns some undefined result. A better way to clear the array would be memset( test_string, 0, sizeof(test_string) );.

You fill the array with "TEST" but don't set the NULL byte at the end, so the last byte is still uninitialized. If you do the memset above this will be fixed, or you can manually do test_string[4] = '\0'.

Graeme Perrow
  • 56,086
  • 21
  • 82
  • 121
  • Sorry I didn't get to responding to this earlier... The other answers happened so fast that I was still responding to the first earlier. Anyway, thank you. I didn't think about the fact that there very well could be a null in memory, so I can't use strlen until I make sure that the first null in the string is at the end of it. Furthermore, thanks for letting me know that initialization statement doesn't set my null for more - That'll be very handy to know for the future. – Ryan Barker Feb 28 '14 at 02:31