1

For straight C and GCC, why doesn't the pointed-to string get corrupted here?

#include <stdio.h>

int main(int argc, char *argv[])
{
    char* str_ptr = NULL; 

    {
        //local to this scope-block
        char str[4]={0};
        sprintf(str, "AGH"); 

        str_ptr = str;
    }

    printf("str_ptr: %s\n", str_ptr);

    getchar();
    return 0;
}

|----OUTPUT-----|

str_ptr: AGH

|--------------------|

Here's a link to the above code compiled and executed using an online compiler.

I understand that if str was a string literal, str would be stored in the bss ( essentially as a static ), but sprintf(ing) to a stack-allocated buffer, I thought the string buffer would be purely stack-based ( and thus the address meaningless after leaving the scope block )? I understand that it may take additional stack allocations to over-write the memory at the given address, but even using a recursive function until a stack-overflow occurred, I was unable to corrupt the string pointed to by str_ptr.

FYI I am doing my testing in a VS2008 C project, although GCC seems to exhibit the same behavior.

falsetru
  • 357,413
  • 63
  • 732
  • 636

2 Answers2

2

While nasal lizards are a popular part of C folklore, code whose behaviour is undefined can actually exhibit any behaviour at all, including magically resuscitating variables whose lifetime has expired. The fact that code with undefined behaviour can appear to "work" should neither be surprising nor an excuse to neglect correcting it. Generally, unless you're in the business of writing compilers, it's not very useful to examine the precise nature of undefined behaviour in any given environment, especially as it might be different after you blink.

In this particular case, the explanation is simple, but it's still undefined behaviour, so the following explanation cannot be relied upon at all. It might at any time be replaced with reptilian emissions.

Generally speaking, C compilers will make each function's stack frame a fixed size, rather than expanding and contracting as control flow enters and leaves internal blocks. Unless called functions are inlined, their stack frames will not overlap with the stack frame of the caller.

So, in certain C compilers with certain sets of compile options and except for particular phases of the moon, the character array str will not be overwritten by the call to printf, even though the variable's lifetime has expired.

rici
  • 234,347
  • 28
  • 237
  • 341
  • This answer would be excellent, if the first sentence were more along the lines of: "While nasal lizards are a popular part of C folklore, code which relies upon undefined behaviour can coincidentally appear to work." – autistic Aug 01 '13 at 03:28
  • @undefinedbehaviour: is your objection to the words "seems to ignore obvious errors"? Otherwise, I'm not sure what the difference is.... – rici Aug 01 '13 at 03:32
  • My objection is to the words "undefined behaviour can be anything at all". That implies that even behaviour which is well-defined can be undefined; Quite scary! `int x = 0;` is not undefined behaviour, for example. "nor an excuse to not correct it" might also read better as "nor an excuse to neglect correcting it". – autistic Aug 01 '13 at 03:38
  • @undefinedbehaviour: I tried a rewording. I'm not sure how you can parse "undefined behaviour" to include well-defined behaviour. I liked "neglect" – rici Aug 01 '13 at 03:43
  • No, undefined behaviour can't be any behaviour. Code which uses it could produce any behaviour. – autistic Aug 01 '13 at 03:44
  • Do you promote anticopulative general semantics? If so, we could probably have an interesting discussion about semantics, during which I might assert that code doesn't produce behaviour; rather, it simply behaves. It might behave gloriously, or deplorably, or in a reptilian fashion. It's behaviour might be surprising, or predictable. Or not defined, and therefore not restricted, by the C standard. Unrestricted behaviour, as far as I can see, could be the same behaviour as exhibited by code with rigidly defined behaviour, which I think justifies my original wording. But anyway, I changed it. – rici Aug 01 '13 at 04:08
1

Most likely the compiler does some sort of simple optimizations resulting in the string still being in the same place on the stack. In other words, the compiler allows the stack to grow to store 'str'. But it doesn't shrink the stack in the scope of main, because it is not required to do so.

If you really want to see the result of saving the address of variables on the stack, call a function.

#include <stdio.h>

char * str_ptr = NULL;
void onstack(void)
{
    char str[4] = {0};
    sprintf(str,"AGH");
    str_ptr = str;
}

int main(int argc, char *argv[])
{  

    onstack();
    int x = 0x61626364;
    printf("str_ptr: %s\n", str_ptr);
    printf("x:%i\n",x);
    getchar();
    return 0;
}

With gcc -O0 -std=c99 strcorrupt.c I get random output on the first printf. It will vary from machine to machine and architecture to architecture.

Eric Urban
  • 3,671
  • 1
  • 18
  • 23
  • So can you tell me the difference between the local scope block and and moving that out into a function? – user2640330 Aug 01 '13 at 03:00
  • 1
    When a compiler generates the code to leave a function, it must have shrunk the stack back to its original size. Otherwise, the calling function would see a corrupted stack. This is known as a calling convention. Almost all conventions allow the caller to assume the stack has not changed after a function call. Information is only left on the stack if the callee has a value to return. Notice the 'onstack' function is declared void. It does not place anything on the stack for the caller to use. To learn more, read: http://en.wikipedia.org/wiki/X86_calling_conventions – Eric Urban Aug 01 '13 at 03:04
  • So basically, because the "calling convention" is not applicable for the scope of the main function, the stack has no requirement to be restored to it's original state pre-main ( if pre-main was a function ). This equates to the string remaining untouched at the point where the printf occurs? – user2640330 Aug 01 '13 at 03:29
  • That is a reasonable assumption. The compiler is obligated to stop you from directly using `str` outside the scope you created for it. That is really the only guarantee as to what the compiler actually has to do in this situation. You demonstrated that it is possible to create and store a copy of the address of `str` You are correct in thinking that 'main' is unique in the way it is invoked. It is not called per se, but instead is used as the entry point into the application. If you like my answer, please vote for it. – Eric Urban Aug 01 '13 at 03:33