NULL behavior when function returns address of local variable in C

Question

I have the following C code:

#include <stdlib.h>
#include <stdio.h>

char* foo() {
    char abc[4] = "abc";
    return abc;
}

int main() {
    printf("%s", foo());
    return 0;
}

If I compile it with gcc and run the executable file, I got (null)% as output.

If I run the slightly modified code:

#include <stdlib.h>
#include <stdio.h>

char* foo() {
    char abc[4] = "abc";
    return abc;
}

int main() {
    printf("%c", *(foo()));
    return 0;
}

I got a segmentation fault.

My question is: why wouldn't my first code get a segmentation fault? I'm running Linux and gcc version: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

Both code, when compiled, will generate a warning: function returns address of local variable [-Wreturn-local-addr] warning

Re “why wouldn't my first code get a segmentation fault?”: In both cases, it appears the program acted as if a null pointer were returned by `foo`. (This behavior is not defined by the C standard, may be the result of optimizer behavior, and is not something you should rely on.) When you passed the pointer to `printf` for `%s`, the `printf` implementation checked whether it was a null pointer and printed “(null)” instead of attempting to dereference it. When you attempted to dereference the pointer yourself to pass to `printf` for `%c`, there was no preliminary check, so the program crashed. — Eric Postpischil, Sep 12 '20 at 17:15
The question "why doesn't this obviously incorrect program crash" never has an interesting answer. It just got lucky this time. Move along, citizen, nothing to see here. Don't write incorrect programs. Obviously this is easier said than done, but the very least thing you can do is [paying attention to compiler warnings](https://stackoverflow.com/questions/57842756/why-should-i-always-enable-compiler-warnings). — n. m. could be an AI, Sep 12 '20 at 17:58
@n.'pronouns'm.: “The question "why doesn't this obviously incorrect program crash" never has an interesting answer” is false. There are things to be learned, including things about how compilers work, how linkers work, how operating systems work, and more. In this case, at least three of the answers were wrong, so clearly there was information to be learned. In general, learning about the causes of crashes helps diagnose future bugs, thus speeding, improving, and reducing the cost of software development. — Eric Postpischil, Sep 12 '20 at 18:47
@n.'pronouns'm.: No, it is not obvious. There are multiple things to learn there. One is the rule that was violated. It was not that dereferencing a pointer to an object whose lifetime has ended is undefined. The rule that was violated is that using such a pointer has undefined behavior. Three answers got that wrong, which means three people, and possibly more readers, did not know or notice a rule that could have caused other programs to misbehave. Learning the correct rule, and learning to recognize it, is useful for avoiding bugs. — Eric Postpischil, Sep 12 '20 at 23:57
@n.'pronouns'm.: Another thing to learn is how the behavior manifested. The compiler did not do a simple thing here. A straightforward implementation of the code would have left an address behind. It did not. Many people are not aware the compiler makes such overarching transformations of a program. They learn a simple model of C computing taught in classes and see optimization as things like consolidation of common subexpressions, or maybe rewriting some arithmetic. Learning that today’s compilers make large abstract transformations is new. — Eric Postpischil, Sep 12 '20 at 23:59
@n.'pronouns'm.: Another thing people learn when asking about undefined behavior is compiler extensions. We have repeatedly seen people ask why they can define `int x;` in multiple modules without getting multiple definition errors when linking. And the answer there is that Unix tools support “common” symbols. They defined the behavior even though the the C standard did not. — Eric Postpischil, Sep 13 '20 at 00:01
@n.'pronouns'm.: Additionally, any good programmer has to learn to trace symptoms back to causes. When a program misbehaves, you cannot throw up your hands and say, well, there is no way to diagnose this; it is a consequence of undefined behavior, and you cannot reason about that, so the only solution is to scour the source code for a mistake. That is not how it works. A good programmer conducts experiments. Many of them still have behavior not defined by the C standard but yield clues nonetheless. They learn from the clues, both in diagnosis the instant problem and… — Eric Postpischil, Sep 13 '20 at 00:02
… in learning how the compiler and other tools behave so they can better diagnose other problems in the future. So, this fiction that there is nothing to be learned about inquiring into “undefined” behavior is nonsense. It arises not from fact or experience but from some myth that the specification of the C standard is the be-all and end-all, and there is no knowledge outside it. It is baloney. — Eric Postpischil, Sep 13 '20 at 00:02
@n.'pronouns'm.: Why is the program not crashing is not the question here in the comments. My answer to that question is in the answer I posted. The question being addressed here is whether the answer is interesting. It is, and similar questions also are, as I have explained and demonstrated, because it brings to light useful information: In this case, a rule in the C standard that was missed or misunderstand is revealed. In others, one may learn about extensions or other behaviors. The suggestion “Just fix it” says to solve one problem only and to neglect what else can be learned. — Eric Postpischil, Sep 13 '20 at 13:03
@n.'pronouns'm.: So that is not a disproof that there is interesting information. It is just a decision to ignore what can be learned. — Eric Postpischil, Sep 13 '20 at 13:08
@n.'pronouns'm.: Re “it doesn't answer the question that was asked”: That is irrelevant because it is not the issue under discussion here, and the rest of your comment follows on in that irrelevant path. — Eric Postpischil, Sep 13 '20 at 17:39
@EricPostpischil I started by discussing possible answers to the question that was asked, and I don't think I strayed anywhere from there. If this is not the issue you are discussing, fine, but this is most definitely the issue I thought I was discussing. I will delete my part because it doesn't further any understanding. — n. m. could be an AI, Sep 13 '20 at 18:42
Gosh... I didn't think such an entry-level question would generate so many responses. I think I am a rather experienced C programmer. Today I was helping a friend with their intro to C class and accidentally landed on this question. I, of course, understand what a correct program should do (use dynamic memory allocation) but was intrigued by this behavior. Thanks to @EricPostpischil for pointing out the compiler optimization. I didn't know gcc can optimize in this way. — Go_printf, Sep 14 '20 at 13:42

Eric Postpischil · Accepted Answer · 2021-06-09T15:11:28.320

4

At the moment return abc; starts to execute, abc is a pointer to an array defined inside foo. (Formally, it designates the array itself, but it is automatically converted to the address of the first element.) The function would be returning this pointer value. However, when execution of the function ends, the lifetime of the array ends.

Per C 2018 6.2.4 2:

The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

When a value is indeterminate in C, it may behave as if it has any value, including having a different value each time you attempt to use it or having a trap value (C 2018 3.19.2 and 3.19.3). Note that this does not just mean what the pointer value points to is indeterminate; the value of the pointer itself is indeterminate.

So, even if abc had some address in memory, say 100400, that does not mean 100400 is returned to the caller. The value returned to the caller is indeterminate: It can be anything, including a null pointer value.

It appears your compiler’s optimizer has responded to the undefined behavior in your code by providing or allowing a null pointer value as the return value of the function foo. This is allowed by the C standard.

When you passed this null pointer to printf for use with %s, your printf implementation checked the pointer, saw it was a null pointer, and printed “(null)” instead of attempting to use it to access a string in memory.

When you tried to dereference the pointer, using *(foo()), there was no preliminary check of the pointer value. The machine code of the program attempted to use the null pointer to access memory, and this resulted in a segment fault.

edited Jun 09 '21 at 15:11

answered Sep 12 '20 at 17:46

Eric Postpischil

195,579
13
168
312

I have a follow-up question: on another machine where I repeatedly run these two codes, the first code generates different nonsense strings at each run (this is expected). The second code, however, always successfully prints `a%`. What would be a good explanation of this? Would that compiler choose to somehow retain the actual value of the pointer pointing to `abc` and the compiler will give the caller access of that memory? Intuitively I would say that should not be a form of undefined behavior, or at least not a good undefined behavior? Or, is there something else going on here? – Go_printf Sep 14 '20 at 13:48
@Go_printf: Is that “%” of “a%” a typo? Because printing “a” can easily happen: `foo` is called, it initializes an array `abc` to contain “abc”, it returns the address of the first element of that array, this address survives compiler optimization, the caller dereferences it to get “a”, the caller passes that to `printf`, and `printf` prints “a”. – Eric Postpischil Sep 14 '20 at 13:51
"Is that “%” of “a%” a typo?" The percentage sign appears in a contrasting background. I think that happens because I didn't add `\n` in `printf`. By "surviving compiler optimization" you mean that this compiler optimized to return the actual address of `abc`? – Go_printf Sep 14 '20 at 14:01
@Go_printf: Yes, if you do not print a new-line, the cursor will be left after the printed text when the program ends, after which the command-line shell will print its prompt, which may be the “%” character. By “surviving compiler optimization” I mean that part of the way in which a compiler may work is by generating code similar to what a human would write and then applying optimization techniques to it—and surviving this means that the initially generated code persists through the result of optimization, rather than being transformed into something different. – Eric Postpischil Sep 14 '20 at 14:04

score 1 · Answer 2 · answered Sep 12 '20 at 15:16

Because you are creating a local variable abc, that variable will only valid in the scope of the function foo. Returning the address of that variable makes no sense as as soon as you return from foo the address will not longer be valid. Also keep in mind C uses the stack to pass arguments to functions and to return from values from them. As well the local variable is also creating in the stack which will be modified by the function call mechanism, so using that address will corrupt the stack eventually.
To create pointers you should use heap allocation (using malloc family of functions) or you must ensure the variable is inside an existing scope by the time you use it.

Objects do not exist outside their **lifetimes**, not their **scopes**. Scopes are where names are visible. Lifetimes are when objects exist. An object can be accessed in code outside its scope during its lifetime by passing a pointer, as when passing the address of an object to a subroutine. — Eric Postpischil, Sep 12 '20 at 15:30

score 1 · Answer 3 · answered Sep 12 '20 at 15:17

1

Your second code invokes undefined behavior as you try to dereference a pointer which points to a local variable. Now this local variable doesn't exists outside it's scope. Thus, the memory isn't valid

In first code, you try to access local variable outside it's scope. Now in this case function is expected to return a char *. As you return a local variable, what you get is null printing which doesn't cause segmentation fault.

answered Sep 12 '20 at 15:17

ameyCU

16,489
2
26
41

Objects do not exist outside their **lifetimes**, not their **scopes**. Scopes are where names are visible. Lifetimes are when objects exist. An object can be accessed in code outside its scope during its lifetime by passing a pointer, as when passing the address of an object to a subroutine. – Eric Postpischil Sep 12 '20 at 15:30

score 0 · Answer 4 · answered Sep 12 '20 at 21:53

Consider the following sequence of events:

You check into a hotel, and you’re placed in room 137.
You tell the front desk to call your friends and invite them to a wild dance party in your room tomorrow at 2AM.
However, your reservation doesn't last until 2AM tomorrow. Perhaps there's another reservation for a different guest. Perhaps the room will stay vacant. Who knows. Maybe the front desk people know. Maybe they don't.

So what should the front desk do?

They could still send an invitation that indicates room 137, perhaps not knowing that you won't be there at that time, because they forgot to check their reservation records. Or maybe they just don't care.

Or they could refuse to send an invitation and tell you that.

Or perhaps they could just ignore your request, not send anything, and not tell anyone.

Or they could send and invitation, but indicate a bogus room number. Perhaps they have invitation blanks prepared beforehand, and they need to just fill in the time and the room number. But being technologically advanced as they are, they won't fill a room number if they know it is not reserved to this particular guest, and sent out one with a default room number — zero perhaps?

Perhaps if we live in the future, they might even send an electronic invitation with room 137 indicated in it — that will self-destruct the moment you check out from the hotel!

Whatever they do, they cannot send an invitation indicating a correct room number, because there is no correct room number. You won't be at any room number. So they do whatever. They may always choose one strategy to deal with this situation. Or they may flip a coin. Or perhaps different staff members will do different things. Who knows.

So some of their strategies will produce a spectacular crash (your friend wakes up a wrong guest at a wrong time, they call a police, and all doesn't end well).

Other strategies will produce less dramatic outcomes. Refuse to continue and let you know? Let you know something is wrong, but continue anyway? Ignore a dangerous instruction? Replace it with a less dangerous instruction? All of these things are possible.

This corresponds to what a compiler might do when you instruct it to do an obviously dangerous and illegal thing. Ignore the danger, or refuse to continue with a diagnostic message, or produce a diagnostic message and continue anyway, or skip the dangerous instruction altogether (but only if it is 100% sure the destruction is imminent), or tweak it slightly so that it is less dangerous. Real compilers actually do all of these things in different circumstances. The important thing is to know that asking a compiler for an impossible thing doesn't always result in the program actually attempting to do the impossible thing.

This answer is in part based on the answer https://stackoverflow.com/a/63862176/775806 which was deleted by its author.

score -1 · Answer 5 · answered Sep 12 '20 at 16:00

-1

Dereferencing object which does not exist is an Undefined Behaviour.

Why first works: my guess is because compiler has optimized out the call to the function.

answered Sep 12 '20 at 16:00

0___________

60,014
4
34
74

NULL behavior when function returns address of local variable in C

5 Answers5