GCC - modify where execution continues after function return

Question

Is it possible to do something like this in GCC?

void foo() {
    if (something()) returnSomewhereElse;
    else return;
}

void bar() {
    foo();
    return; // something failed, no point of continuing
    somewhereElse:
    // execution resumes here if something succeeds
    // ...
}

can this be achieved in a portable way, using C and GCC extensions, without using platform specific assembly?
the stack state will not change between the normal and altered return points, so is it possible to reuse the code that restores the stack and registers state from the regular return?
considering that the function may or may not be inlined, if called it must alter the return address, if inlined it must only alter the code path but not the current function return address, as that would break the code
the alternate return point does not need to be a label, but I am hoping GCC's address of label extension can come in handy in this situation

Just to clarify the intent - it is about error handling. This example is a minimal one, just to illustrate things. I intend to use that in a much deeper context, to stop execution if an error occurs. I also assume the state doesn't change, which I may be wrong about, because there are no extra local variables added between the two return points, so I was hoping the code generated by the compiler to do that on foo's return can be reused for that and save the overhead of using longjmp, setting and passing the jump buffers.

The example "does make sense" because its intent is to show what I want to achieve, not why and how it would make sense in actual code.

Why is your idea simpler of better then simply returning a value from foo() and having bar() either return or execute somewhereElse: conditionally?

It is not simpler, and what you suggest is not applicable in practice, only in the context of a trivial example, but it is better because:

1 - it doesn't involve an additional returning of a value

2 - it doesn't involve an additional checking of a value

3 - it doesn't involve an additional jump

I am probably incorrectly assuming the goal should be clear at this point, and after all the clarifications and explanations. The idea is to provide an "escape code path" from a deep call chain without any additional overhead. By reusing the code the compiler has generated to restore the state of the previous call frame and simply modify the instruction at which execution resumes after the function returns. Success skips the "escape code path", the first error that occurs enters it.

if (failure) return; // right into the escape code path
else {
    doMagickHere(); // to skip the escape code path
    return; // skip over the escape code path
}

//...
void bar() {
    some locals;
    foo();
    // enter escape code path here on foo failure so
    destroy(&locals); // cleanup
    return; // and we are done
    skipEscapeCodePath: // resume on foo success
    // escape path was skipped so locals are still valid
}

As for the claims made by Basile Starynkevitch that longjmp is "efficient" and that "Even a billion longjmp remains reasonable" - sizeof(jmp_buf) gives me a hefty 156 bytes, which is apparently the space needed to save pretty much all registers and a bunch of other stuff, so it can later be restored. Those are a lot of operations, and doing that a billion times is far, far outside of my personal understandings of "efficient" and "reasonable". I mean a billion jump buffers themselves are over 145 GIGABYTES of memory alone, and then there is the CPU time overhead as well. Not a whole lot of systems out there which can even afford that kind of "reasonable".

This does not make any sense. What are you trying to achieve — Ed Heal, Nov 22 '15 at 11:05
Have the function return a value and decide what to do in the calling function based on that value. — Unimportant, Nov 22 '15 at 11:06
@EdHeal - the question is not whether it makes sense to you. The goal obviously is error handling, I am hoping to achieve the interruption of a deep call chain without having to check return values. — IvanB, Nov 22 '15 at 11:08
what do you mean by "portable"? I know that gcc has a feature for goto across functions, but won't work on other compilers. — OznOg, Nov 22 '15 at 11:08
@user1320881 - it seems applicable in this scenario, but it is just a small example, and is not applicable in the scenario I intend to use it. — IvanB, Nov 22 '15 at 11:08
@IvanB - Apparently it does not make any sense to a few other people as well — Ed Heal, Nov 22 '15 at 11:09
@OznOg - I added GCC as a tag to specify it is a given. So by portable I mean platform portability while still using GCC. — IvanB, Nov 22 '15 at 11:10
Why is your idea simpler of better then simply returning a value from `foo()` and having `bar()` either return or execute `somewhereElse:` conditionally? The point is your example does not justify the mechanism when sensible design using basic mechanisms of the language suffice in your example situation. More generally perhaps C++ exception handling is what you need. — Clifford, Nov 22 '15 at 11:29
@Clifford - yes it is just an example, its point is to illustrate what I want to achieve, not how it makes sense in actual code. Unfortunately, I cannot use C++. — IvanB, Nov 22 '15 at 11:38
You are reinventing `longjmp` if you want that to be portable and general (and compatible with recursive functions) — Basile Starynkevitch, Nov 22 '15 at 12:59
@BasileStarynkevitch - well, not all of our systems come with a spare 150 gigs of ram to afford to be reasonable with a billion `longjmp`s. — IvanB, Nov 22 '15 at 13:07
A billion `longjmp` will generally use a constant memory space of a hundred bytes. They won't even spend a single megabyte, and the billion calls to `longjmp` will require a few seconds in total, each of them lasting a few dozens of nanoseconds... — Basile Starynkevitch, Nov 22 '15 at 13:10
@BasileStarynkevitch - somehow and amazingly so you are STILL missing the point that it is about error handling and achieving in practice stack unwingding. So those billion jumps will require a billion buffers. In fact the buffers will end up being there even if no jumps are made. So to my question about zero cost error handling you are answering with a solution with an impossible cost. — IvanB, Nov 22 '15 at 13:11
No, my improved answer show that there is no billion buffer in heap, only one single `jmp_buf` *temporarily* on the call stack. — Basile Starynkevitch, Nov 22 '15 at 13:57
So how do you trace back your N > 1 number of steps with only one jump buffer? Nobody is talking about heap, but the deeper you go, the more buffers you leave behind so you can make the jump if you have to. You can at best limit it to one buffer for each stack frame, reusing the same buffer if there are more than one error prone operations per frame. — IvanB, Nov 22 '15 at 14:04
Please read more about `longjmp`. It can unwind dozens of thousands of call frames in constant time, and most call stacks don't have that much. A typical call stack cannot exceed a few megabytes on most laptops — Basile Starynkevitch, Nov 22 '15 at 14:07
I am well aware of the typical stack sizes, but even though the stack as a memory buffer is a flat object, what happens inside has a tree structure. Storing a jump buffer on every frame in a deep call chain will run out of stack space in no time. Since even if the buffer only stores a few registers, it is still a constant size construct. Might have been nice if the compiler actually trimmed it down to only what is needed to store a particular state. — IvanB, Nov 22 '15 at 14:13
See my example. You don't store a `jmp_buf` in most call frames, only in some initial ones. — Basile Starynkevitch, Nov 22 '15 at 15:35
As a caution: let's keep this discussion civil, and avoid pointed comments about other users. — Brad Larson, Nov 22 '15 at 16:08
@BasileStarynkevitch if you don't store each of the steps then the stack unwinding will not collect all the objects in between, it will all leak, which is why your solution is utterly useless. The goal is not to get back to where you started, it is to do so AND properly collect all objects along the call stack. This will require a LOT of jump buffers and will result in massive CPU and MEMORY OVERHEAD. — IvanB, Jan 13 '16 at 21:23

Basile Starynkevitch · Answer 1 · 2015-11-22T17:09:20.297

No, this is not possible portably, and I'm not sure to guess exactly what you want to achieve.

Terminology

Perhaps you want some non-local jump. Read carefully about setjmp.h, coroutines, call stack, exception handling, continuations, and continuation-passing-style. Understanding what call/cc is in Scheme should be very beneficial.

`setjmp` and `longjmp`

setjmp and longjmp are standard C99 functions (and they are quite fast, because the saved state is actually quite small). Be quite careful when you use them (in particular to avoid any memory leak). longjmp (or the related siglongjmp in POSIX) is the only way in portable standard C99 to escape from some function and get back into some caller.

The idea is to provide an "escape code path" from a deep call chain without any additional overhead

This is exactly the role of longjmp with setjmp. Both are quick, constant-time, operations (in particular unwinding a call stack of many thousands of call frames with longjmp takes a short and constant time). The memory overhead is practically one local jmp_buf per catch point, not a big deal. The jmp_buf is rarely put outside of the call stack.

A common way to use efficiently them would be to put the setjmp-ed jmp_buf in a local struct (so in your call frame) and pass the pointer to that struct to some internal static function(s) which indirectly would call longjmp on error. Hence setjmp and longjmp can, with wise coding conventions, mimic quite well and efficiently the complex semantics of C++ exception throwing and handling (or of Ocaml exceptions, or of Java exceptions, which both have a different semantics than C++). They are portable basic bricks enough for such a purpose.

Practically speaking, code something like:

  struct my_foo_state_st {
    jmp_buf jb;
    char* rs;
    // some other state, e.g a ̀ FILE*` or whatever
  };

  /// returns a `malloc̀ -ed error message on error, and NULL on success
  extern const char* my_foo (struct some_arg_st* arg);

The struct my_foo_state_st is the private state. The my_foo is the public function (which you would declare in some public header). You did document (at least in the comment) that it returns a heap allocated error message on failure, hence the caller is responsible for freeing it. On success, you documented that it returns NULL. Of course, you could have other conventions and other arguments and/or result.

We declare and implement now an error function which is printing the error message into the state and escapes with a longjmp

  static void internal_error_printf (struct my_foo_state*sta, 
       int errcode, 
       const char *fmt, ...) 
   __attribute__((noreturn, format(printf(2,3))));

  void internal_error_printf(struct my_foo_state*sta, 
       int errcode, const char *fmt, ...) {
    va_arg args;
    va_start(args, fmt);
    vasprintf(&sta->rs, fmt, args);
    va_end(args);
    longjmp(sta->jb, errcode);
  }

We now have several possibly complex and recursive functions doing the bulk of the work. I only sketch them, you know what you want them to do. Of course you might want to give them some additional arguments (that is often useful, and it is up to you).

  static void my_internal_foo1(struct my_foo_state_st*sta) {
    int  x, y;
    // do something complex before that and compute x,y
    if (SomeErrorConditionAbout(sta))
       internal_error_printf(sta, 35 /*error code*/,
                            "errror: bad x=%d y=%d", x, y);
    // otherwise do something complex after that, and mutate sta
  }

  static void my_internal_foo2(struct my_foo_state_st*sta) {
    // do something complex 
    if (SomeConditionAbout(sta))
       my_internal_foo1(sta);
    // do something complex and/or mutate or use `sta`
  }

^{(even if you have dozens of internal functions like above, you don't consume a jmp_buf in any of them; and you could also recurse quite deeply in them. You just need to pass a pointer -to struct my_foo_state_st in all of them, and if you are single-threaded and don't care about reentrancy, you could store that pointer in some static variable... or some thread-local one, without even passing it in some argument, which I find still preferable -since more re-entrant and thread friendly).}

At last, here is the public function: it sets up the state and do a setjmp

  // the public function
  const char* my_foo (struct some_arg_st* arg) {
     struct my_state_st sta;
     memset(&sta, 0, sizeof(sta));
     int err = setjmp(sta->jb);
     if (!err) { // first call
       /// put something in `sta` related to ̀ arg̀ 
       /// start the internal processing
       //// later,
       my_internal_foo1(&sta);
       /// and other internal functions, possibly recursive ones
       /// we return NULL to tell the caller that all is ok
       return NULL;
     }
     else { // error recovery
       /// possibly release internal consumed resources
       return sta->rs;
     };
     abort(); // this should never be reached
  }

Notice that you can call your my_foo a billion times, it will not consume any heap memory when not failing, and the stack will grow by a hundred of bytes (released before returning from my_foo). And even if it failed a billion times by your private code calling a billion times the internal_error_printf no memory leak happens (because you documented that my_foo is returning an error string which the caller should free) if coding properly.

Hence using properly setjmp and longjmp a billion times does not eat a lot of memory (only a few hundred bytes on the call stack for one single local jmp_buf, which is popped on the my_foo function return). Indeed, longjmp is slightly more costly than a plain return (but it does the escape that return does not), so you would prefer to use it on error situations.

But using setjmp and longjmp is tricky but efficient and portable, and makes your code difficult to understand as documented by setjmp. It is important to comment it quite seriously. Using these setjmp and longjmp cleverly and wisely does not require "gigabytes" of RAM, as wrongly said in the edited question (because you consume only one single jmp_buf on the call stack, not billions of them). If you want more sophisticated control flow, you'll use a local jmp_buf at each and every dynamic "catch point" in the call stack (and you'll probably have a dozens of them, not billions). You'll need millions of jmp_buf only in the hypothetical case of a recursion of several millions call frames, each being a catch point, and that is not realistic (you'll never have a recursion of a depth of one million, even without any exception handling).

See this for a better explanation of setjmp for "exception" handling in C (and SFTW for other ones). FWIW, Chicken Scheme has a very inventive usage of longjmp and setjmp (related to garbage collection and to call/cc !)

Alternatives

setcontext(3) was perhaps POSIX but is now obsolete.

GCC has several useful extensions (some of them understood by Clang/LLVM) : statement exprs, local labels, labels as values and computed goto, nested functions, constructing function calls, etc.

^{(My feeling is that you are misunderstanding some concepts, notably the precise role of the call stack, so your question is very unclear; I gave some useful references)}

returning a small `struct`

Notice also that on some ABIs, notably x86-64 ABI on Linux, returning a small struct (e.g. of two pointers, or of one pointer and one int or long or intptr_t number) is extremely efficient (since both pointers or integers go thru registers), and you could take advantage of that: decide that your function returns a pointer to the primary result and some error code, both packed in a single small struct:

struct tworesult_st {
 void* ptr;
 int err;
};

struct towresult_st myfunction (int foo) {
  void* res = NULL;
  int errcode = 0;
  /// do something
  if (errcode) 
    return (struct tworesult_st){NULL, errcode};
  else
    return (struct tworesult_st){res, 0};
}

On Linux/x86-64 the code above is optimized (when compiled with gcc -Wall -O) to return in two registers (without any stack consumed for the returned struct).

Using such a function is simple and very efficient (no memory involved, the two member ̀ struct` will be passed in processor registers) and could be as simple as:

struct tworesult_st r = myfunction(34);
if (r.err) 
  { fprintf(stderr, "myfunction failed %d\n", r.err); exit(EXIT_FAILURE); }
else return r.ptr;

Of course you could have some better error handling (it is up to you).

Other hints

Read much more about semantics, in particular operational semantics.

If portability is not the major concern, study the calling conventions of your system and its ABI and the generated assembler code (gcc -O -Wall -fverbose-asm foo.c then look inside foo.s) , and code the relevant asm instructions.

Perhaps libffi could be relevant (but I still don't understand your goals, only guessed them).

You could try using label exprs and computed gotos, but unless you understand the generated assembler code, the result might not be what you expect (because the stack pointer changes at function calls and returns).

Self-modifying code is frowned upon (and "impossible" in standard C99), and most C implementations put the binary code in a read-only code segment. Read also about trampoline functions. Consider perhaps JIT compiling techniques, à la libjit, asmjit, GCCJIT.

^{(I firmly believe that the pragmatical answer to your concerns is either longjmp with suitable coding conventions, or simply returning a small struct; both can be used portably in a very efficient way, and I cannot imagine a case where they are not efficient enough)}

Some languages: Scheme with its call/cc, Prolog with its backtracking features, are perhaps more adapted (than C99 is) to the needs of the OP.

I don't want to use longjmp, since it has overheads of saving the state, and it does seem redundant, since the state doesn't change between the regular and modifier return. I'd really like to avoid it. The goal here may become more clear if you imagine this in a deeper call chain, that needs to be interrupted on failure without doing extensive checking. — IvanB, Nov 22 '15 at 11:15
@IvanB: you need to save some state, and `longjmp` is quite quick. You should edit your question to improve it and explain why you reject using `longjmp`; and you understanding is wrong, the state is changing. — Basile Starynkevitch, Nov 22 '15 at 11:15
Quite quick if you do it once, but what about a million times? Can't the state saving be avoided, and use foo's epilogue to restore it and only adjust the instruction pointer register? — IvanB, Nov 22 '15 at 11:19
Even a billion `longjmp` remains reasonable, and you seems to misunderstand what a call stack is — Basile Starynkevitch, Nov 22 '15 at 11:20
@BasileStarynkevitch - it seems that `longjmp` requires a jump buffer parameter, so that means this not only needs to be set a billion times, but also passed as a parameter a billion times. That just doesn't seem reasonable. Can you explain how the state changes? No extra variables are added in between the two points. — IvanB, Nov 22 '15 at 11:28
Why is that not reasonable? Did you benchmark? I believe you have several misconceptions in your head. The jump buffer is small enough to be passed very quickly as argument — Basile Starynkevitch, Nov 22 '15 at 11:40
And no, I cannot explain easily how the state changes. I taught semantics at Univ Paris 6 (Master's level, Computer Science), and it took me several hours of courses to explain that. I cannot afford spending that much time on a single SO answer. I gave you several references. And explaining that to you would require me to understand your misconceptions (and I only guessed them) — Basile Starynkevitch, Nov 22 '15 at 11:43
Take several hours to study the generated assembler code (if you can read it), and several days to read about continuations; I gave two pragmatic approaches: use `longjmp` and return a small `struct`; and *you* have not explained clearly enough what you are really asking. And portability looked like the main question — Basile Starynkevitch, Nov 22 '15 at 11:48
Then the answer I gave you should be enough. The small `struct` return has no overhead with x86-64 ABI, and `longjmp` has a negligible overhead. — Basile Starynkevitch, Nov 22 '15 at 11:54
You still did not explain your goals in widely accepted terms. Speak of call frames and/or registers. In particular, you did not mention exception in your question — Basile Starynkevitch, Nov 22 '15 at 11:59
What is "widely accepted terms"? Is common sense and simple logic too much to expect? What I aim to achieve is entirely obvious - enter an "escape" code path out of a deep call chain if something in it fails, and do it without overheads, by reusing the code generated by the compiler and altering where the code resumes execution on return. — IvanB, Nov 22 '15 at 12:05
But in *standard* C99, the ***only way to escape* is `longjmp`** (or `abort` & `exit`, if you accept to escape the entire program) — Basile Starynkevitch, Nov 22 '15 at 12:08
@BasileStarynkevitch - why is it not possible to simply modify the instruction at which execution resumes and simply `return`? The function epilogue restores the state and execution resumes a little ahead, skipping the escape path. Sounds simple enough, but maybe there is technical reason to make it impossible? — IvanB, Nov 22 '15 at 12:16
The code segment is generally read-only, and even when code is read-write, self-modifying code is non-reentrant (think of recursive functions) — Basile Starynkevitch, Nov 22 '15 at 13:00
@BasileStarynkevitch - isn't that kept on the stack? I mean the address of the instruction where execution resumes? — dtech, Nov 22 '15 at 13:21
@BasileStarynkevitch - your relentless adding of more and more stuff to your answer is heartwarming. But if I wanted to go for another solution, I'd simply have the functions return a bool and check that and make a jump. It is practically not possible to unwind the call stack without storing every jump buffer. While you were on it, I did some more research - it seems that modifying the instruction where execution resumes after the return is simple enough, as it is always stored in a uniform location in every implementation. The problem part is the frame restoration, which is getting skipped. — IvanB, Nov 22 '15 at 13:56
Frame restoration is implementation specific and cannot be done in portable C (but in some sense, `longjmp` is the *only portable way* to restore the call stack). Some C implementations don't use any hardware stack (because there is no one, .e.g. on IBM serie Z) — Basile Starynkevitch, Nov 22 '15 at 13:58
As it is not part of the function epilogue, which is not context aware, and is done by the code at the call location. Too bad, it would have been a very efficient way to handle errors. — IvanB, Nov 22 '15 at 13:58
Function prologue and epilogue is implementation specific (and not required or mentioned by the C99 standard, IIRC) — Basile Starynkevitch, Nov 22 '15 at 14:03
So all that is needed to make my scheme work is have the state restoration code twice, once for the failure branch and once for the success branch. So that skipping the failure branch restores the frame state, thus avoiding any extra checks, return values or jumps. Total cost - adding an immediate value when restoring the program counter on success and 2x state restoration code. That's pretty efficient compared to most exception handling implementations I am aware of. — IvanB, Nov 22 '15 at 14:09
Some other gcc features are [Label attributes](https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Label-Attributes.html#Label-Attributes), [return address](https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Return-Address.html#Return-Address) and [goto labels](https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Extended-Asm.html#GotoLabels). These are all non-standard C99; but the can be useful for different non-local and/or exception mechanisms. Such things will need to be ported to a new CPU; but you can keep the impact to a minimum with macros, etc. — artless noise, Nov 23 '15 at 16:11
Nice detailed answer, but doesn't your `setjmp` example in `my_foo` have [undefined behaviour](https://www.securecoding.cert.org/confluence/display/c/MSC22-C.+Use+the+setjmp%28%29,+longjmp%28%29+facility+securely)? — Adam, Nov 23 '15 at 18:22

score -9 · Accepted Answer · answered Nov 22 '15 at 15:13

After giving it some extra thought, it is not as simple as it initially appeared. There is one thing that prevents it from working - the code of functions is not context aware - there is no way of knowing the frame it was invoked in, and that has two implications:

1 - modifying the instruction pointer if not portable is easy enough, as every implementation defines a consistent place for it, it is usually the first thing on the stack, however modifying its value to skip the escape trap will also skip the code which restores the previous frame state, since that code is there, not in the current frame - it can't perform state restoration since it has no information of it, the remedy for that, if an extra check and jump are to be omitted is to duplicate the state restoring code in both locations, unfortunately, this can only be done in assembly

2 - the amount of instructions which need to be skipped is unknown too, and depends on which is the previous stack frame, depending on the number of locals that need destruction it will vary, it will not be a uniform value, the remedy for that would be to push both the error and success instruction pointers on the stack when the function is invoked, so it can restore one or the other depending on whether an error occurs or not. Unfortunately, that too can only be done in assembly.

Seems that such a scheme can only be implemented on level compiler, demanding its own calling convention, which pushes two return locations and inserts state restoration code at both. And the potential savings from this approach hardly merit the effort to write a compiler.

Notice that such reasons explain why C calling conventions are not as universal as some people think: Languages like Ocaml, Common Lisp, Prolog have compiler implementations with calling conventions incompatible with those of C because of similar reasons — Basile Starynkevitch, Nov 22 '15 at 17:36
I think you need to show what your expect for `foo` in assembler. On the ARM, it would modify the `lr` (return address). Your issue with `setjmp`/`longjmp` is it saves all registers; yet in your question, you say portable. You could possibly pass a `label` with gcc extensions and return there. Unfortunately, the `somewhereElse` may have a completely different stack frame so it is quite possible that other registers might need to change. — artless noise, Nov 23 '15 at 14:55

GCC - modify where execution continues after function return

2 Answers2

Terminology

setjmp and longjmp

Alternatives

returning a small struct

Other hints

`setjmp` and `longjmp`

returning a small `struct`