3

I'm writing some code to debug stackful coroutines that use Boost.Context's make_fcontext and jump_fcontext, and have run into a small problem.

Normally it is not possible to backtrace past the entry of a stackful coroutine as it executes on its own stack. This means that I cannot determine from a debugger from where a coroutine was entered. This, however, is not the problem about which I am asking. I already solved this problem by adding some inline assembly and DWARF bytecode in the function I pass to make_fcontext:

__asm__ volatile (
  "mov %[caller_fcontext_t] %[somewhere]\n\t"
  ".cfi_escape /* DWARF bytecode to load caller_fcontext_t from "
  "             * somewhere and use it to load all the registers saved "
  "             * there by jump_fcontest */"
  "call %[another_function]"
  : /* stuff */ : /* stuff */ : /* stuff */)

This really does work and I can now backtrace to the point in the caller where it starts or resumes the inner coroutine - but only sometimes.

It turns out that gdb has a "sanity check": if the stack pointer moves in the "wrong" direction between call frames, gdb assumes that the stack is corrupt and stops the trace with the message "Backtrace stopped: previous frame inner to this frame (corrupt stack?)".

This gets triggered when my stacks are allocated in certain ways, but not in other ways. I even have a test with statically allocated stacks that triggers this failure when used in forward order but not when used in reverse order.

I even found the portion of gdb's source code that performs this check here: https://github.com/bminor/binutils-gdb/blob/master/gdb/frame.c#L737-L816

Now here's my actual question: How can I fix this?

Is there some assembly incantation I can write that tells GDB "trust me, I know what I'm doing"?

Filipp
  • 1,843
  • 12
  • 26
  • Do you have any reference code or any specific documentation, I am trying to implement the same kind of logic in my fibers implementation but I haven't been able to find a solution yet :( – Daniele Salvatore Albano Jan 08 '22 at 08:52
  • 1
    GDB ignores this heuristic for inline functions. So my solution was to wrap the `jump_fcontext` call in two functions: one `__attribute__((noinline))` and the other `__attribute__((force_inline))`. I used trial and error to find the combination that worked at all optimization levels used by my project. – Filipp Jan 10 '22 at 11:44

1 Answers1

2

Now here's my actual question: How can I fix this?

Is there some assembly incantation I can write that tells GDB "trust me, I know what I'm doing"?

There currently is no way to do this. It would be a good idea, but probably would require a DWARF extension of some kind. So, it may be difficult to implement.

You can see the evidence of this in the gdb sources: GCC had a similar issue involving -fsplit-stack, and this was worked around by simply coding the name of the offending function into gdb:

  if (!morestack_name || strcmp (morestack_name, "__morestack") != 0)

A quick workaround for your personal use is to just comment out the early return here.

Tom Tromey
  • 21,507
  • 2
  • 45
  • 63
  • Excellent answer. I noticed that for some optimization levels, I do not encounter this failure in either direction. Specifically `-O0 -g` and `-O3` fail with "previous frame inner to this frame", but `-O2 -g` and `-O3 -g` don't. I suspect it has to do with how functions are inlined. Is there some combination of always inline and never inline I can use to trick gdb for any combination of compile flags? – Filipp Sep 26 '18 at 16:24
  • Not that I know of, actually I'm surprised to hear that it could be circumvented this way. I don't have a theory for why that would be the case. – Tom Tromey Sep 26 '18 at 16:40