37

I recently started programming in C again after having programmed in C++ for a while, and my understanding of pointers is a bit rusty.

I would like to ask why this code is not causing any errors:

char* a = NULL;
{
    char* b = "stackoverflow";
    a = b;
}

puts(a);

I thought that because b went out of scope, a should reference a non-existing memory location, and thus their would be a runtime error when calling printf.

I ran this code in MSVC about 20 times, and no errors were shown.

MattMatt2000
  • 622
  • 6
  • 15
  • 24
    This is not undefined. String literals are allocated statically. This code is perfectly fine. – Eugene Sh. Jun 13 '17 at 19:06
  • 9
    In other words, `b` has gone out of scope, but not what it pointed to. – Weather Vane Jun 13 '17 at 19:08
  • 2
    OTOH, behavior _would_ be undefined if you instead had: `char b[] = { 's', 't', 'a', 'c', 'k', 'o', 'v', 'e', 'r', 'f', 'l', 'o', 'w', '\0' }`. – ad absurdum Jun 13 '17 at 19:14
  • 5
    @DavidBowling Or even `char b[] = "stackoverflow";` – Eugene Sh. Jun 13 '17 at 19:15
  • 1
    @EugeneSh.-- well, that's just lazy typing ;) – ad absurdum Jun 13 '17 at 19:16
  • Just out of curiosity, how are static strings managed? Do they get inlined in assembly? – MattMatt2000 Jun 13 '17 at 19:19
  • 5
    @CometEngine They are likely to sit in either `.rodata` or even in the `.text` section. – Eugene Sh. Jun 13 '17 at 19:20
  • How can data be inlined apart from integer values? Some compilers have an option to share identical string literals. – Weather Vane Jun 13 '17 at 19:20
  • 1
    @WeatherVane the generated object file has a section, in ELF called `.rodata`, consisting of `read only data` like string literals and number arrays declared `static const`, which will not change throughout execution. See http://wiki.osdev.org/ELF – cat Jun 13 '17 at 20:28
  • @cat a data section is not "inline". – Weather Vane Jun 13 '17 at 20:29
  • 1
    @WeatherVane Oh, I misunderstood that part of your comment. The `.text` section is (normally) also readonly, and you can (not that you should unless you are an optimising compiler) tell the assembler to put char literals in it too, "inline" with the executable code but with a separate label of course – cat Jun 13 '17 at 20:32
  • Possible duplicate of [When a pointer is created in scope, what happens to the pointed to variable when the pointer goes out of scope?](https://stackoverflow.com/questions/14857246/when-a-pointer-is-created-in-scope-what-happens-to-the-pointed-to-variable-when), https://stackoverflow.com/questions/267114/scope-of-string-literals – Cody Gray - on strike Jun 14 '17 at 01:15
  • You are not referencing a char* that went out of scope. – user253751 Jun 14 '17 at 07:22
  • @Cody Gray I don't think tagging the question as duplicate would be a good idea. Sure, it's about the same topic, but the example is different and this question has got way more attention/upvotes. – MattMatt2000 Jun 14 '17 at 12:01
  • Either this is not C code your the code **definitively** will generate errors. Starting with a missing `main`, statements outside a function body and missing prototypes for functions. – too honest for this site Jul 09 '17 at 21:13
  • Does this answer your question? [Scope of (string) literals](https://stackoverflow.com/questions/267114/scope-of-string-literals) – NAND Jun 07 '20 at 00:00

7 Answers7

47

Inside the scope where b is defined, it is assigned the address of a string literal. These literals typically live in a read-only section of memory as opposed to the stack.

When you do a=b you assign the value of b to a, i.e. a now contains the address of a string literal. This address is still valid after b goes out of scope.

If you had taken the address of b and then attempted to dereference that address, then you would invoke undefined behavior.

So your code is valid and does not invoke undefined behavior, but the following does:

int *a = NULL;
{
    int b = 6;
    a = &b;
}

printf("b=%d\n", *a);

Another, more subtle example:

char *a = NULL;
{
    char b[] = "stackoverflow";
    a = b;
}

printf(a);

The difference between this example and yours is that b, which is an array, decays to a pointer to the first element when assigned to a. So in this case a contains the address of a local variable which then goes out of scope.

EDIT:

As a side note, it's bad practice to pass a variable as the first argument of printf, as that can lead to a format string vulnerability. Better to use a string constant as follows:

printf("%s", a);

Or more simply:

puts(a);
dbush
  • 205,898
  • 23
  • 218
  • 273
  • thanks for the explanation and the examples; really helped :) – MattMatt2000 Jun 13 '17 at 19:21
  • 2
    @CometEngine Glad I could help. Feel free to [accept this answer](https://stackoverflow.com/help/accepted-answer) if you found it useful. – dbush Jun 13 '17 at 19:22
  • 3
    I think even the latter code might get optimized to `printf("stackoverflow")` depending on the compiler/switches you're using – Govind Parmar Jun 13 '17 at 19:25
  • 2
    @Govind Parmar Just tested on msvc with full optimizations and inlining; seems to be the case. – MattMatt2000 Jun 13 '17 at 19:30
  • This is so interesting. I thought that the variable scoping in C was merely syntactical. –  Jun 13 '17 at 21:05
  • Regarding your edit, it's better yet to use `puts`. :-) – Cody Gray - on strike Jun 14 '17 at 01:11
  • The *variable scoping* is indeed merely syntactical, @Mints97. The issue is just how string literals are stored. – Cody Gray - on strike Jun 14 '17 at 01:12
  • @CodyGray: I'm talking about dereferencing the address of a variable that went out of scope. Why would it even be UB if it there's no issues on the stack, i.e. you'd expect an implementation to hold `a` and `b` in the same stack frame? –  Jun 14 '17 at 02:47
  • @Mints97 In theory, if you had another scope block after the first which defined a local variable, it could sit at the same address as `b`. It's all up to the implementation, however. – dbush Jun 14 '17 at 03:24
  • @Cody Gray sure :) puts is even faster than printf; – MattMatt2000 Jun 16 '17 at 17:29
  • @stargateur - dbust may be trying to attract attention to his answer and be rewarded more up-votes. If the answer receives the bounty, no points lost and the benefit of any additional up-votes. – Peter Jul 06 '17 at 10:35
  • 1
    @Stargateur I like the step-by-step nature of the answer given by SiggiSv. It makes it easier for beginners to follow. It definitely deserves more than just the 1 upvote it had when the bounty was set. – dbush Jul 06 '17 at 11:14
12

Line by line, this is what your code does:

char* a = NULL;

a is a pointer not referencing anything (set to NULL).

{
    char* b = "stackoverflow";

b is a pointer referencing the static, constant string literal "stackoverflow".

    a = b;

a is set to also reference the static, constant string literal "stackoverflow".

}

b is out of scope. But since a is not referencing b, then that does not matter (it's just referencing the same static, constant string literal as b was referencing).

printf(a);

Prints the static, constant string literal "stackoverflow" referenced by a.

SiggiSv
  • 1,219
  • 1
  • 10
  • 20
11

String literals are statically allocated, so the pointer is valid indefinitely. If you had said char b[] = "stackoverflow", then you would be allocating a char array on the stack that would become invalid when the scope ended. This difference also shows up for modifying strings: char s[] = "foo" stack allocates a string that you can modify, whereas char *s = "foo" only gives you a pointer to a string that can be placed in read-only memory, so modifying it is undefined behaviour.

kyle
  • 446
  • 4
  • 10
9

Other people have explained that this code is perfectly valid. This answer is about your expectation that, if the code had been invalid, there would have been a runtime error when calling printf. It isn't necessarily so.

Let's look at this variation on your code, which is invalid:

#include <stdio.h>
int main(void)
{
    int *a;
    {
        int b = 42;
        a = &b;
    }
    printf("%d\n", *a); // undefined behavior
    return 0;
}

This program has undefined behavior, but it happens to be fairly likely that it will, in fact, print 42, for several different reasons — many compilers will leave the stack slot for b allocated for the entire body of main, because nothing else needs the space and minimizing the number of stack adjustments simplifies code generation; even if the compiler did formally deallocate the stack slot, the number 42 probably remains in memory until something else overwrites it, and there's nothing in between a = &b and *a to do that; standard optimizations ("constant and copy propagation") could eliminate both variables and write the last-known value for *a directly into the printf statement (as if you had written printf("%d\n", 42)).

It's absolutely vital to understand that "undefined behavior" does not mean "the program will crash predictably". It means "anything can happen", and anything includes appearing to work as the programmer probably intended (on this computer, with this compiler, today).


As a final note, none of the aggressive debugging tools I have convenient access to (Valgrind, ASan, UBSan) track "auto" variable lifetimes in sufficient detail to trap this error, but GCC 6 does produce this amusing warning:

$ gcc -std=c11 -O2 -W -Wall -pedantic test.c
test.c: In function ‘main’:
test.c:9:5: warning: ‘b’ is used uninitialized in this function
    printf("%d\n", *a); // undefined behavior
    ^~~~~~~~~~~~~~~~~~

I believe what happened here was, it did the optimization I described above — copying the last known value of b into *a and then into the printf — but its "last known value" for b was a "this variable is uninitialized" sentinel rather than 42. (It then generates code equivalent to printf("%d\n", 0).)

zwol
  • 135,547
  • 38
  • 252
  • 361
3

String literals are always allocated statically and program can access anytime,

char* a = NULL;
{
    char* b = "stackoverflow";
    a = b;
}

printf(a);

Here memory to string literal "stackoverflow" is allocated by compiler same as it allocate memory to int/char variables or pointers.

Difference is that string literal are places in READONLY section/segment. Variable b is allocated at stack but it is holding memory address of read only section/segment.

In the code var b has an address of string literal. Even when b looses its scope the memory for string literal will always be allocated.

Note: Memory allocated to string literals is part of binary and will be removed once the program is unloaded.

Refer ELF binary specification to understand in more details.

NAND
  • 663
  • 8
  • 22
anshkun
  • 105
  • 1
  • 12
2

The code doesn't generate any error because you are simply assigning character pointer b to another character pointer a and that is perfectly fine.

In C, You can assign a pointer reference to another pointer. here actually the string "stackoverflow" is used as a literal and the base address location of that string will be assign to a variable.

Though you are out of scope for variable b but still the assignment had been done with the a pointer. So it will print the result without any error.

Jay Patel
  • 2,341
  • 2
  • 22
  • 43
2

I think that, as a proof of previous answers, it is good to take a look at what really sits inside your code. People already mentioned that string literals lay inside .text section. So, they (literals) are simply, always, there. You can easily find this for the code

#include <string.h>

int main() {
  char* a = 0;
  {
    char* b = "stackoverflow";
    a = c;
  }
  printf("%s\n", a);
}

using following command

> cc -S main.c

inside main.s you will find, at the very bottom

...
...
...
        .section        __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
        .asciz  "stackoverflow"

L_.str.1:                               ## @.str.1
        .asciz  "%s\n"

You can read more about assembler sections (for example) here: https://docs.oracle.com/cd/E19455-01/806-3773/elf-3/index.html

And here you can find very well prepared coverage of Mach-O executables: https://www.objc.io/issues/6-build-tools/mach-o-executables/

Oo.oO
  • 12,464
  • 3
  • 23
  • 45