2

I have an assembly program that is used in my OS to run after GRUB, and I'm having a strange problem where .word 65535 causes QEMU to reboot and I can't figure out why.

I have done some testing and I have figured out what line causes the problem using jmp $ and I have confirmed that it is the line I have mentioned above.

My Multiboot compliant code is:

/* Enable intel syntax */
.intel_syntax noprefix
/* Declare constants for the multiboot header. */
.set ALIGN,    1<<0             /* align loaded modules on page boundaries */
.set MEMINFO,  1<<1             /* provide memory map */
.set FLAGS,    ALIGN | MEMINFO  /* this is the Multiboot 'flag' field */
.set MAGIC,    0x1BADB002       /* 'magic number' lets bootloader find the header */
.set CHECKSUM, -(MAGIC + FLAGS) /* checksum of above, to prove we are multiboot */

/* 
Declare a multiboot header that marks the program as a kernel. These are magic
values that are documented in the multiboot standard. The bootloader will
search for this signature in the first 8 KiB of the kernel file, aligned at a
32-bit boundary. The signature is in its own section so the header can be
forced to be within the first 8 KiB of the kernel file.
*/
.section .multiboot
.align 4
.long MAGIC
.long FLAGS
.long CHECKSUM

/*
The multiboot standard does not define the value of the stack pointer register
(esp) and it is up to the kernel to provide a stack. This allocates room for a
small stack by creating a symbol at the bottom of it, then allocating 16384
bytes for it, and finally creating a symbol at the top. The stack grows
downwards on x86. The stack is in its own section so it can be marked nobits,
which means the kernel file is smaller because it does not contain an
uninitialized stack. The stack on x86 must be 16-byte aligned according to the
System V ABI standard and de-facto extensions. The compiler will assume the
stack is properly aligned and failure to align the stack will result in
undefined behavior.
*/
.section .bss
.align 16
stack_bottom:
.skip 16384 # 16 KiB
stack_top:

/*
The linker script specifies _start as the entry point to the kernel and the
bootloader will jump to this position once the kernel has been loaded. It
doesn't make sense to return from this function as the bootloader is gone.
*/
.section .text
.global _start
.type _start, @function
_start:
    /*
    The bootloader has loaded us into 32-bit protected mode on a x86
    machine. Interrupts are disabled. Paging is disabled. The processor
    state is as defined in the multiboot standard. The kernel has full
    control of the CPU. The kernel can only make use of hardware features
    and any code it provides as part of itself. There's no printf
    function, unless the kernel provides its own <stdio.h> header and a
    printf implementation. There are no security restrictions, no
    safeguards, no debugging mechanisms, only what the kernel provides
    itself. It has absolute and complete power over the
    machine.
    */

    /*
    To set up a stack, we set the esp register to point to the top of the
    stack (as it grows downwards on x86 systems). This is necessarily done
    in assembly as languages such as C cannot function without a stack.
    */
    mov stack_top, esp

    /*
    This is a good place to initialize crucial processor state before the
    high-level kernel is entered. It's best to minimize the early
    environment where crucial features are offline. Note that the
    processor is not fully initialized yet: Features such as floating
    point instructions and instruction set extensions are not initialized
    yet. The GDT should be loaded here. Paging should be enabled here.
    C++ features such as global constructors and exceptions will require
    runtime support to work as well.
    */

    /*
    GDT from the old DripOS bootloader, which was from the original
    project (The OS tutorial)
    */

    gdt_start:

        .long 0x0
        .long 0x0

    gdt_code: 
        .word 65535     /* <-------- this line causing problems */
        .word 0x0
        /*.byte 0x0
        .byte 0x9A*/ /*10011010 in binary*/
        /*.byte 0xCF*/ /*11001111 in binary*/
        /*.byte 0x0*/
    jmp $
    gdt_data:
        .word 0xffff
        .word 0x0
        .byte 0x0
        .byte 0x92 /*10010010 in binary*/
        .byte 0xCF /*11001111 in binary*/
        .byte 0x0

    gdt_end:

    gdt_descriptor:
        .word gdt_end - gdt_start - 1
        .long gdt_start

    #CODE_SEG gdt_code - gdt_start
    #DATA_SEG gdt_data - gdt_start

    lgdt [gdt_descriptor]
    jmp $
    /*
    Enter the high-level kernel. The ABI requires the stack is 16-byte
    aligned at the time of the call instruction (which afterwards pushes
    the return pointer of size 4 bytes). The stack was originally 16-byte
    aligned above and we've since pushed a multiple of 16 bytes to the
    stack since (pushed 0 bytes so far) and the alignment is thus
    preserved and the call is well defined.
    */
    call main

    /*
    If the system has nothing more to do, put the computer into an
    infinite loop. To do that:
    1) Disable interrupts with cli (clear interrupt enable in eflags).
       They are already disabled by the bootloader, so this is not needed.
       Mind that you might later enable interrupts and return from
       kernel_main (which is sort of nonsensical to do).
    2) Wait for the next interrupt to arrive with hlt (halt instruction).
       Since they are disabled, this will lock up the computer.
    3) Jump to the hlt instruction if it ever wakes up due to a
       non-maskable interrupt occurring or due to system management mode.
    */
    cli
1:  hlt
    jmp 1b

/*
Set the size of the _start symbol to the current location '.' minus its start.
This is useful when debugging or when you implement call tracing.
*/
.size _start, . - _start

I expect QEMU to continue working after the call to .word 65535 but instead QEMU reboots and the OS does not boot.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Menotdan
  • 130
  • 1
  • 11
  • 3
    You put data in the execution path, don't do that. Move your gdt stuff to the end. What do you think the cpu will do after processing line 68? – Jester Sep 03 '19 at 12:45
  • Ok, I will try that. I'm not super familiar with assembly as I work a wide variety of languages, but I will try to remember that in the future. Thanks! – Menotdan Sep 03 '19 at 12:49
  • 2
    OH yeah I just realized inserting bytes in the middle of execution probably isn't a good idea – Menotdan Sep 03 '19 at 12:51

1 Answers1

4

As was pointed out in the comments you placed the GDT in the middle of your code. The processor can't distinguish between what is code and data when mixed. The CPU would have attempted to start executing the GDT as code after the instruction mov stack_top, esp . objdump -Dz -Mintel1 on the object file shows that these instructions would have been executed:

boot.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
   0:   89 24 25 00 00 00 00    mov    DWORD PTR ds:0x0,esp

0000000000000007 <gdt_start>:
   7:   00 00                   add    BYTE PTR [rax],al
   9:   00 00                   add    BYTE PTR [rax],al
   b:   00 00                   add    BYTE PTR [rax],al
   d:   00 00                   add    BYTE PTR [rax],al

000000000000000f <gdt_code>:
   f:   ff                      (bad)
  10:   ff 00                   inc    DWORD PTR [rax]
  12:   00 eb                   add    bl,ch
  14:   fe                      (bad)

[snip]

The CPU would have been able to execute the first number of bytes in the GDT as bogus instructions but when it hit the 0xffff in gdt_code the instructions can't be decoded as valid instructions. OBJDUMP shows those as (bad).

The fix is simple as @Jester says - simply move the GDT (and all data) after the code. The preference is to place data and code in different sections so that it is separated.


Footnotes

1OBJDUMP option meaning:

  • -D option shows code disassembly
  • -z option display all the zero bytes in the file
  • -Mintel displays the code using Intel syntax rather than the default AT&T syntax
Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • 1
    Yep, I have moved the data to the bottom, and the kernel finally loads. Ive been trying to get this to work for a while. The only problem is now im getting a bunch of interrupt 13, so the GDT is probably misdefined or something. – Menotdan Sep 03 '19 at 14:58
  • @Menotdan There are a number of problems but most of them are in your Makefile and how you generate elf file and binaries. In `boot.s` you have issues with your setting of segment registers. You have the mov source and destination order reversed. As well you can't set CS with a MOV. You have to do a FAR JMP to set CS, and in order for a far jmp to work in GNU assembler you have to define your GDT and CODE_SEG specifically before the FAR JMP. A revised `Makefile` (considerable modifications) can be found here: https://pastebin.com/mPTUgQne and a new `boot.s` here: pastebin.com/8YVLVSRq – Michael Petch Sep 03 '19 at 17:24
  • @Menotdan : I was able to run your code and a teardrop shaped image (blue and white) appeared mid screen and the exceptions were gone. A note about the new Makefile. You may have to add things back in. I couldn't understand what the floppy disk and hard drive image stuff was all about so I have yanked it out. This makefile generates an ISO image called `myos.iso`. You can use `make run` and `make debug` to run it in QEMU. – Michael Petch Sep 03 '19 at 17:26
  • You are super helpful thanks SO MUCH. I will try this later and try to understand what you changed so I can learn. Thanks again! – Menotdan Sep 03 '19 at 20:13
  • Ok I see that you added a far jump, which I also did after I made that comment, and you reversed the order of mov, I thought it was different in AT&T syntax? And also you cleaned up the makefile but I don't have time right now to see if it works or to see what you changed in the makefile. – Menotdan Sep 03 '19 at 20:26
  • Ok so it works, but I don't think the timer is receiving interrupts and when you press keys on the keyboard, it starts spitting errors. But now I can debug it so yay!! – Menotdan Sep 03 '19 at 21:02
  • I think now its a problem with the interrupts ill have to work on it later – Menotdan Sep 03 '19 at 21:07
  • @Menotdan : Did you happen to use the James Malloy tutorial and the interrupt handling from there? I noticed one big problem is that the registers_t structure is passed by value and not by reference. You may wish to read one of my other SO answers on that tutorial bug: https://stackoverflow.com/questions/56481584/cannot-modify-data-segment-register-when-tried-general-protection-error-is-thro – Michael Petch Sep 03 '19 at 21:41
  • Actually I followed a tutorial that I think is based on that one and a couple of others. I think it's got a lot of bugs, but I fixed one where it had its own custom-defined types. – Menotdan Sep 03 '19 at 21:46
  • @Menotdan Yep, definitely *based* on that tutorial – Michael Petch Sep 03 '19 at 21:47
  • 1
    I have it booting from GRUB now, a combination of fixing that bug, and also using the CPU when it's done loading the OS – Menotdan Sep 05 '19 at 23:37
  • OK well it still doesn't work on real hardware, with the same exception as before, so I made it so the OS prints an error code when an exception occurs – Menotdan Sep 06 '19 at 12:11