How to determine from where a program jumped to an invalid address — without single-stepping?

Question

I'm trying to debug a multithreaded program, which somehow ends up with RIP=0x0 and lots of zeros on the stack. Is there any way to find out where the program was just one instruction before? When I try single-stepping, the result appears different (likely some race condition), but if I just start the program and let it go, it consistently lands here.

So is there any way to trap on a jump/call to zero address before it is actually taken, without doing single-stepping or emulation? Is there maybe some register holding address of previous instruction?

Are you implying that stack trace in gdb is corrupted as well? — user7860670, Apr 28 '17 at 11:18
@VTT yes, `bt` shows the zeros I can see via `x/100gx $rsp`. — Ruslan, Apr 28 '17 at 11:19
Yes, it does change the result of execution, and reports before the crash (in another place) as "Warning: client switching stacks?" on a `ret` instruction. But that would land not at the address I get without valgrind — Ruslan, Apr 28 '17 at 11:22
That is a good starting point to start fixing things, smells like a stack-allocated buffer overrun. — user7860670, Apr 28 '17 at 11:26
If your stack is wiped out, then there's obviously no way of doing this *after* the problem occurs. If you can't repro it by single-stepping, and you have a Heisenbug where any attempt to instrument your code would mask it, then you are pretty much SOL. Maybe you could run the code under some type of logging debugger that would store a separate stack frame? I'm not even sure if such a thing exists. — Cody Gray - on strike, Apr 28 '17 at 12:02
Instead of `stepi`, try `watch $rip if $rip == 0`. It's still single-stepping, but it's faster and maybe less intrusive enough that you can catch your bug. — Mark Plotnick, Apr 28 '17 at 15:37
@MarkPlotnick this appears much slower than valgrind for me. I waited for about 10 minutes, and it's still inside `ld-linux-x86-64.so.2`. — Ruslan, Apr 28 '17 at 16:04
It probably is slower than valgrind. Is it possible to run your program up to a known good point, before the stack is clobbered, and start the watching or single-stepping there? — Mark Plotnick, Apr 28 '17 at 19:15
@MarkPlotnick the mere fact of introducing a breakpoint in the known good point already changes the result so that segfault is now with `$rip!=0`. — Ruslan, Apr 28 '17 at 19:44
On x86-64 you can try gdb's "btrace" support (read in the manual about the `record` command). This uses an on-chip buffer to record the last N branches, so you can usually skip backward to find the last bad branch. — Tom Tromey, Apr 29 '17 at 15:00
@TomTromey this looks interesting. It seems to use HW-provided facility of Branch Trace Store. But does GDB support multithreaded processes with this? Simple `record` doesn't. — Ruslan, Apr 29 '17 at 15:20

score 1 · Answer 1 · answered Apr 29 '17 at 04:08

1

So is there any way to trap on a jump/call to zero address before it is actually taken, without doing single-stepping or emulation?

No.

Is there maybe some register holding address of previous instruction?

Not on x86 (there is such a register on HPPA).

Since from your followup comments it appears that you have a stack overflow that wipes the return address and eventually causes you to return to 0, note that:

valgrind is exceptionally weak at detecting such overflows and
address sanitizer should have little problem pointing you directly at the stack overflow.

Since you suspect a race condition, note that thread sanitizer is even better for finding these.

answered Apr 29 '17 at 04:08

Employed Russian

199,314
34
295
362

I've just found out from the comment by Tom Tromey that there exist such features as Branch Trace Store and Processor Trace in Intel's Atom and Core CPUs. Isn't it what I was asking for? (Still trying to figure out how it works and how to use it.) – Ruslan Apr 29 '17 at 15:28
@Ruslan Looks like I was wrong, there is indeed branch trace store. `perf branch record` may help you https://events.linuxfoundation.org/slides/2011/linuxcon-japan/lcj2011_nagai.pdf – Employed Russian Apr 29 '17 at 16:54
@Ruslan I've added a possibly better answer. – Employed Russian Apr 29 '17 at 17:27

score 1 · Accepted Answer · answered Apr 29 '17 at 17:26

Is there maybe some register holding address of previous instruction?

There is no such register, but there is Branch Trace Store, and GDB supports it with record btrace command.

Note: from above wikipedia article:

Branch tracing on Intel processors can cause 40x application run-time slow down.

Here is how you could use record btrace to debug your problem:

cat t.c
#include <string.h>
int bar()
{
  char buf[10];
  memset(buf, 0, sizeof(buf));
  memset(buf, 'A', 100);  // overflow
}

int foo()
{
  return bar();
}

int main()
{
  return foo();
}

gcc -g  t.c -fno-stack-protector

gdb -q ./a.out

(gdb) run
Starting program: /tmp/a.out

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400562 in bar () at t.c:7
7   }
(gdb) bt 5
#0  0x0000000000400562 in bar () at t.c:7
#1  0x4141414141414141 in ?? ()
#2  0x4141414141414141 in ?? ()
#3  0x4141414141414141 in ?? ()
#4  0x4141414141414141 in ?? ()
(More stack frames follow...)

Hard to debug: we have no idea what happened here (this, I think, models your current problem).

(gdb) start
Temporary breakpoint 1 at 0x400577: file t.c, line 16.
Starting program: /tmp/a.out

Temporary breakpoint 1, main () at t.c:16
16    return foo();
(gdb) record btrace
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400562 in bar () at t.c:7
7   }
(gdb) record instruction-history
719    0x00007ffff7a9e531 <memset+113>: movdqu %xmm8,0x20(%rdi)
720    0x00007ffff7a9e537 <memset+119>: movdqu %xmm8,-0x30(%rdi,%rdx,1)
721    0x00007ffff7a9e53e <memset+126>: movdqu %xmm8,0x30(%rdi)
722    0x00007ffff7a9e544 <memset+132>: movdqu %xmm8,-0x40(%rdi,%rdx,1)
723    0x00007ffff7a9e54b <memset+139>: add    %rdi,%rdx
724    0x00007ffff7a9e54e <memset+142>: and    $0xffffffffffffffc0,%rdx
725    0x00007ffff7a9e552 <memset+146>: cmp    %rdx,%rcx
726    0x00007ffff7a9e555 <memset+149>: je     0x7ffff7a9e4fa <memset+58>
727    0x00007ffff7a9e4fa <memset+58>:  repz retq
728    0x0000000000400561 <bar+52>: leaveq

Above instruction trace tells us that we crashed on return from bar, and that memset was executing just before the return.

(gdb) record instruction-history -
709    0x00007ffff7a9e4cd <memset+13>:  punpcklwd %xmm8,%xmm8
710    0x00007ffff7a9e4d2 <memset+18>:  pshufd $0x0,%xmm8,%xmm8
711    0x00007ffff7a9e4d8 <memset+24>:  cmp    $0x40,%rdx
712    0x00007ffff7a9e4dc <memset+28>:  ja     0x7ffff7a9e510 <memset+80>
713    0x00007ffff7a9e510 <memset+80>:  lea    0x40(%rdi),%rcx
714    0x00007ffff7a9e514 <memset+84>:  movdqu %xmm8,(%rdi)
715    0x00007ffff7a9e519 <memset+89>:  and    $0xffffffffffffffc0,%rcx
716    0x00007ffff7a9e51d <memset+93>:  movdqu %xmm8,-0x10(%rdi,%rdx,1)
717    0x00007ffff7a9e524 <memset+100>: movdqu %xmm8,0x10(%rdi)
718    0x00007ffff7a9e52a <memset+106>: movdqu %xmm8,-0x20(%rdi,%rdx,1)
(gdb)
699    0x00007ffff7a9e5b6 <memset+246>: retq
700    0x000000000040054b <bar+30>: lea    -0x10(%rbp),%rax
701    0x000000000040054f <bar+34>: mov    $0x64,%edx
702    0x0000000000400554 <bar+39>: mov    $0x41,%esi
703    0x0000000000400559 <bar+44>: mov    %rax,%rdi
704    0x000000000040055c <bar+47>: callq  0x400410 <memset@plt>

... And this is where the memset was called from.

705    0x0000000000400410 <memset@plt+0>:   jmpq   *0x200c02(%rip)        # 0x601018 <memset@got.plt>
706    0x00007ffff7a9e4c0 <memset+0>:   movd   %esi,%xmm8
707    0x00007ffff7a9e4c5 <memset+5>:   mov    %rdi,%rax
708    0x00007ffff7a9e4c8 <memset+8>:   punpcklbw %xmm8,%xmm8

This looks really good for single-threaded programs. However, my GDB crashes with [this simple multithreaded test (in C++)](https://pastebin.mozilla.org/9020302) (checked with GDB 7.10.50.20151205-cvs and 8.0.50.20170429-git on a 32-bit userspace). — Ruslan, Apr 29 '17 at 17:52
@Ruslan Your test works fine for me using GDB 7.9 and 7.12.50.20170222-git on x86_64 and i686 target (GDB itself is 64-bit). — Employed Russian, Apr 29 '17 at 18:13
@Ruslan Also works fine using GDB 8.0.50.20170429-git built in 32-bit mode. — Employed Russian, Apr 29 '17 at 18:32
Yeah, just tested on a couple of 32- and 64-bit Kubuntus of different versions, where it indeed works. My initial test system (LFS) must be special. Well, LFS is always special ;D . — Ruslan, Apr 29 '17 at 18:47

How to determine from where a program jumped to an invalid address — without single-stepping?

2 Answers2

Linked