3

Minimal working example source:

use16
org 0x7c00

jmp 0x0000:@start


@start:
    cli
        mov ax,cs
        mov ds,ax
        mov es,ax
        mov ss,ax
        mov sp,0x7c00
    sti

    mov bp,sp

    call @fails

jmp @start

@fails:
    nop
retn

times 510-($-$$) db 0
dw 0xAA55

The above is assembled to a binary via FASM or NASM and written to the MBR of a VHD (x.vhd). QEMU is started with debugging support enabled: qemu-system-i386.exe -m 512 -boot c -net nic -net user -hda x.vhd -no-acpi -s -S -cpu 486

In Cygwin, with GDB 9.1-1, the following commands are then issued to run GDB and attach to QEMU, identify the architecture, skip over the BIOS code and set a breakpoint at the start of the MBR, loaded at 0x7c00, before continuing:

gdb
target remote localhost:1234
set architecture i8086
stepi 11
break *0x7c00
continue
nexti 10

At this point, the $ip is at the nop line.

@fails:
    nop  ;  <---
retn

Here is where the problem is encountered. Using stepi, a temporary breakpoint is placed on the retn line, execution continues through the nop and breaks where it should. Using stepo or nexti, however, execution does continue without problems, but no breakpoint is reached. GDB just enters an endless wait. If a breakpoint is inserted manually after the nop (on the retn or at jmp @start, the instruction to which the call returns with retn, for example), GDB breaks at that one without issues. Listing all breakpoints at this point produces no new ones, so it seems nexti did not put a temporary breakpoint anywhere (or temporary breakpoints are not listed?).

After a lot of investigation, here are the strange "fixes" for the issue that let nexti automatically just skip over nop and break there as it should: (1) set $ebp = 0x7bfa and then nexti or (2) comment out the line mov bp,sp.

The questions are:

  1. Why does nexti seem to break the breakpoint mechanism at that particular location? Perhaps a breakpoint is put between instructions? Or nexti has issues with 16-bit code somehow?
  2. Why do the strange "fixes" outlined above work? How can $ebp be at all related to anything if it is not used at any point after the initialization? Does GDB use it internally?
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
NULLx
  • 61
  • 5
  • 1
    Yes, `ebp` is used by gdb during stack walk to determine stack frames and backtrace. Should still work, of course. The bug may even be in the qemu gdb stub. In any case, gdb is not a very good 16 bit real mode debugger. – Jester Apr 22 '20 at 12:14
  • My thoughts exactly. As a side question, any other suggestions for a simple bootloader development cycle and debugging tool are welcome. – NULLx Apr 22 '20 at 14:57
  • 2
    Use BOCHS. It's definitely the go-to recommendation for bootloaders and debugging the switch to 32 or 64-bit mode. Its built-in debugger can decode GDT, IDT, and page tables for you to make sure your code set them up the way you think you did. And decode / print exception details. – Peter Cordes Apr 25 '20 at 04:11
  • Makes sense, considering half the posts on real mode debugging mention it at one point or another. This will definitely help. I'll try it out. Thank you. – NULLx Apr 25 '20 at 08:53

1 Answers1

2

The following gdb command enables remote protocol debugging and results in display of exchange between the gdb and QEMU gdbserver: set debug remote 1. QEMU replies in the exchange look reasonable, the gdb however ends up setting breakpoint at the address 0x5ea7c17, which is wrong. It looks like it interprets 4 bytes at the sp as the return address instead of only 2. 0x7c17 is the actual return address, 0xea and 0x5 are the first two bytes of the boot sector code.

jcmvbkbc
  • 723
  • 1
  • 4
  • 5
  • You are correct. One step closer to the answers. The command `set debug remote 1` is definitely helpful. Thanks for that. Regrettably, it does not shine a light on the internals of GDB itself - the link to $ebp fixing the issue (which might also be connected to stepi and nexti/stepo in this case). Any idea on that? – NULLx Apr 25 '20 at 09:03
  • 1
    GDB function `insert_step_resume_breakpoint_at_caller` is responsible for setting up a breakpoint that is supposed to catch execution after the `ni`. GDB command `set debug frame 1` makes it dump information related to dealing with stack frames. Cases with and without `bp` initialization result in different debug output. – jcmvbkbc Apr 26 '20 at 09:49
  • The information from `set debug frame 1` (and, by extension I assume) `info frame` is related to `insert_step_resume_breakpoint_at_caller`? I had already noticed the the assumed return address being wrong in `info frame`, so the connection between it and this function is the missing link. If so, please add this to your answer, so I can mark it as the accepted solution. Separately, why would GDB look at $ebp at all for the return address? Does it assume `enter`/`leave` was used? – NULLx Apr 26 '20 at 10:15