1

I'm developing a small demo that boots a x86_64 machine. During early init (real mode), I set videomode 3 via int 10h. I then write to memory-mapped text at 0xb8000. My second stage already is high-level C code. This worked perfectly in protected mode, 32 bit, with paging.

I changed the bootloader to also enable PAE and then set LME, then jump to the second stage (which then has been compiled as x86_64 already). This is where my display fell apart and I have no idea what is going on. I've been debugging small samples and have something that works reliably even in 64 bit mode:

    for (uint32_t i = 0xb8000; i < 0xb8000 + (25 * 80 * 2); i += 2) {
        *((volatile uint16_t*)i) = 0x0741;
    }

As expected, this fills the screen with all "A"s. Here's the generated assembly:

000000000000843f <main>:
    843f:   f3 0f 1e fa             endbr64
    8443:   55                      push   %rbp
    8444:   48 89 e5                mov    %rsp,%rbp
    8447:   c7 45 fc 00 80 0b 00    movl   $0xb8000,-0x4(%rbp)
    844e:   eb 0c                   jmp    845c <main+0x1d>
    8450:   8b 45 fc                mov    -0x4(%rbp),%eax
    8453:   66 c7 00 41 07          movw   $0x741,(%rax)
    8458:   83 45 fc 02             addl   $0x2,-0x4(%rbp)
    845c:   81 7d fc 9f 8f 0b 00    cmpl   $0xb8f9f,-0x4(%rbp)
    8463:   76 eb                   jbe    8450 <main+0x11>
    8465:   90                      nop
    8466:   eb fd                   jmp    8465 <main+0x26>

However, when I change my code to this:

    volatile uint16_t *screen_base = (volatile uint16_t*)0xb8000;
    for (uint32_t i = 0; i < 25 * 80; i++) {
        screen_base[i] = 0x0741;
    }

It stops working; it outputs pink control characters (indicating the "0x07" is the character and the "0x41" is the color code), but does not even fill the whole screen (last two characters at the lower right end are not filled). Here's the generated assembly:

000000000000843f <main>:
    843f:   f3 0f 1e fa             endbr64
    8443:   55                      push   %rbp
    8444:   48 89 e5                mov    %rsp,%rbp
    8447:   48 c7 45 f0 00 80 0b    movq   $0xb8000,-0x10(%rbp)
    844e:   00 
    844f:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    8456:   eb 17                   jmp    846f <main+0x30>
    8458:   8b 45 fc                mov    -0x4(%rbp),%eax
    845b:   48 8d 14 00             lea    (%rax,%rax,1),%rdx
    845f:   48 8b 45 f0             mov    -0x10(%rbp),%rax
    8463:   48 01 d0                add    %rdx,%rax
    8466:   66 c7 00 41 07          movw   $0x741,(%rax)
    846b:   83 45 fc 01             addl   $0x1,-0x4(%rbp)
    846f:   81 7d fc cf 07 00 00    cmpl   $0x7cf,-0x4(%rbp)
    8476:   76 e0                   jbe    8458 <main+0x19>
    8478:   90                      nop
    8479:   eb fd                   jmp    8478 <main+0x39>

Weirdly enough I can mask the issue by just botching the pointer to 0xb8003, but this is obviously incorrect. I cannot figure out what is going on here, does anyone have an idea what could be happening?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847

1 Answers1

3

The asm is fairly painful to follow in a debug build but it looks normal. Are you 100% sure you are in 64-bit mode after a jmp far or whatever to a CS descriptor with the L bit set? Because there's strong evidence you aren't.

0x48 in 32-bit mode is the opcode for dec eax (instead of a REX.W prefix), which looks like it might explain an offset of -3 bytes. add %rdx, %rax becomes dec %eax ; add %edx, %eax. And earlier there's a dec %eax before the LEA that doubles it, and before the mov -0x10(%ebp),%eax store.

Your version that works avoids any REX prefixes by casting uint32_t to a pointer. Note that none of the instructions use 64-bit operand-size, R8-R15, or BPL-DIL, so none of them start with a 40 to 4F byte in machine code. (Except the initial mov %rsp, %rbp, but EAX isn't live at that point; the next access to EAX is write-only.)

So that's pretty strong evidence the CPU's not in full 64-bit mode. Use Bochs to single-step your switch to long mode and check what mode you're actually in. And single-step by instructions in the not-working code; you'll see the 48 bytes decode as separate instructions. (You can do that in QEMU + GDB as well; GDB might not be sure what mode the CPU is in, but single-stepping via the trap flag TF will reflect what the CPU is actually doing.)

BTW, GCC debug builds prefer using EAX/RAX first for evaluating any expression, perhaps because it's the return value register. If GCC had happened to pick different registers, decrementing EAX wouldn't have mattered. But you'd definitely run into problems at some point, e.g. when GCC used RAX because it's the return value register, or when it tried to use the DIL byte register with a 0x40 REX prefix (inc eax in 32-bit mode.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Well, there's *one* REX in the good code but it would decrement `eax` with no ill effects since it's overwritten later before any use. Awesome catch by the way. – paxdiablo May 31 '23 at 20:51
  • 2
    @paxdiablo: Thanks, fixed. The OP mentioned switching to 64-bit mode but omitted that from their [mcve], so I was definitely suspicious about it for an [osdev] question. We've had previous Q&As about disassembling machine code in the wrong mode, maybe even running in the wrong mode (especially 16 vs. 32), but I don't remember one quite like this where the problem was just a pointer offset. (32 vs. 64-bit mode usually doesn't affect instruction-length decoding, something I'm familiar with from code golf. Unlike 16 vs. 32 where immediates and addressing modes differ in length.) – Peter Cordes May 31 '23 at 21:03
  • I do believe you are right, but don't understand why. I'm first entering protected mode and do the transition to long mode afterwards, so I have two GDTs, one with bit 21 set in the CS/DS descriptors, one without. I do lgdt of the 64-bit GDT, then set 0x20 in %cr4 (PAE = 1), finally RDMSR of 0xc0000080, OR with 0x100, WRMSR, lastly far jump to 64 bit code. I need to investigate why this is not behaving as expected, but I think you got me on the right track, thanks! – performancematters May 31 '23 at 21:39
  • @performancematters: I'd recommend single-stepping in Bochs and using its built-in debugger to dump info on GDT entries and stuff like that. Perhaps your GDT entries aren't exactly what you intended, so having something else pretty-print them could help identify that. – Peter Cordes May 31 '23 at 21:42
  • 1
    @performancematters: https://stackoverflow.com/questions/22962251/how-to-enter-64-bit-mode-on-a-x86-64 *may* help, it seems fairly detailed on the steps you must take. I can see you're doing at least some of the steps in the accepted answer but there may be some you're not doing (the paging disable/enable and possibly LME/CR3 loading?). – paxdiablo Jun 01 '23 at 02:47
  • 1
    That was exactly the issue. I messed up my paging initialization, hence I was operating in PE, PAE, LME mode but no PG, therefore no LMA. Peter's interpretation of the results was spot-on. Since I've only identity-mapped memory, I didn't notice paging was disabled initially. Thanks Peter! – performancematters Jun 02 '23 at 09:53