2

I found the following piece of code embedded in a C++ project. The code goes backwards through a C-style string. When I saw this I thought this should result in undefined behaviour. But it seems to work perfectly:

const char * hello = "Hello World.";
const char * helloPointPos = strchr(hello, '.');
for (const char * curchar = helloPointPos; *curchar; curchar--) {
  printf("%s", curchar);
}

What I was wondering about is the part with *curchar; curchar--. This assumes that the string begins with a \0. Is this a legal assumption? Does this piece of code result in undefined behaviour? If not, why not?

I would appreciate if you could put some light on this. BTW platform is Windows and Compiler is VC++ 2010.

EDIT : Thank you all for your participation. Both answers are very good and helped me. But since I can only accept one answer I will go for paxdiablo's answer since it has more detail. Thank you!

gdiquest
  • 125
  • 8

2 Answers2

7

No, it's very much not a requirement that the character before a string be \0, so that code does not have defined behaviour.

In fact, it's doubly undefined since you're not permitted to derefernce a pointer that's not within the array or one byte beyond the end. Since this is dereferencing one byte before the array, it's invalid in that sense as well.

It may work in some situations(a) but it's by no means good code.

In any case, the printing of the string rather than the character is going to give you strange results:

.d.ld.rld.orld.World. World.

and so on.

A better reverse iterator would be something like:

char *curchar = &(hello[strlen (hello)]);  // one byte beyond
while (curchar-- != hello)                 // check if reached start, post-decr
    putchar (*curchar);                    // just the character, thanks.

(a) In fact, it's often one of the most annoying things about undefined behaviour is that it sometimes does work, lulling you into a false sense of security.

I've often thought that all coders should have electrical wires hooked up to their most private parts so that undefined behaviour could deliver a short sharp shock - I suspect there would be a lot less undefined behaviour (or far fewer developers) after a while :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
4

It's certainly not defined behavior, but in this case it isn't surprising that it works.

const char * hello = "Hello World."; puts the string Hello World. in a section with all other string literals. So very likely, there's a string literal before it, and it ends with \0, so there's \0' before Hello World., and the code works.

Obviously you can't rely on it - you're string might be the first in the section, or some non-string constant may be in there. Also, if the string is allocated any other way, chances to get \0 before it are lower.

ugoren
  • 16,023
  • 3
  • 35
  • 65
  • You can see an example of a situation where that loop condition will fail in the next piece of code: char * a = "BYE"; char * c = "HELLO"; *(a+3) = 'L'; printf(">%d\t%d\n", *(--c),'\0'); – Pablo Francisco Pérez Hidalgo May 08 '13 at 06:39
  • @PabloFranciscoPérezHidalgo, modifying `*(a+3)` is, in this case, undefined behavior in itself (`"BYE"` is a constant). Anyway, I never said it's guaranteed to have `\0` before a constant string. – ugoren May 08 '13 at 13:23