3

In the second stage of my bootloader I'm trying to load some sectors off a virtual floppy disk into memory in bochs, but upon invoking int 0x13, the routine just does not return.

I believe the relevant code from my second stage is:

bootsys_start:
    mov %cs, %ax
    mov %ax, %ds

    /*
     * Remap IRQs. Interrupts have been disabled in the
     * bootloader already.
     */

    mov i8259A_ICW1($i8259A_IC4), %al
    out %al, i8259A_ICW1_ADDR($i8259A_MASTER)
    out %al, i8259A_ICW1_ADDR($i8259A_SLAVE)

    mov i8259A_ICW2($USER_INT_START), %al
    out %al, i8259A_ICW2_ADDR($i8259A_MASTER)
    mov i8259A_ICW2($USER_INT_START + 8), %al
    out %al, i8259A_ICW2_ADDR($i8259A_SLAVE)

    mov i8259A_ICW3($0x4), %al
    out %al, i8259A_ICW3_ADDR($i8259A_MASTER)
    mov i8259A_ICW3($0x2), %al
    out %al, i8259A_ICW3_ADDR($i8259A_SLAVE)

    mov i8259A_ICW4($i8259A_uPM & i8259A_x86), %al
    out %al, i8259A_ICW4_ADDR($i8259A_MASTER)
    out %al, i8259A_ICW4_ADDR($i8259A_SLAVE)

    call mm_detect

    /* Load the kernel now. */

    xor %bp, %bp
1:
    mov $KERNEL_ORG >> 0x4, %ax
    mov %ax, %es
    mov $KERNEL_ORG & 0xf, %bx
    mov $0x200 | KERNEL_SECTORS, %ax
    mov $(KERNEL_C << 0x8) | KERNEL_S, %cx
    mov $(KERNEL_H << 0x8) | FLOPPY_DRV, %dx
    int $0x13 /* <--- This int 0x13 doesn't seem to return */
    jnc 1f
    cmp $0x2, %bp
    je floppy_err
    inc %bp
    xor %ah, %ah
    int $0x13
    jmp 1b

All of the code can be found in my Github repository. To build just use make all and then run with BOCHS using the command bochs


First thing I did was verifying I really got all parameters right. r in bochs' shell yields:

CPU0:
rax: 00000000_534d0201 rcx: 00000000_00000005
rdx: 00000000_534d0000 rbx: 00000000_00000000
rsp: 00000000_00007700 rbp: 00000000_00000000
rsi: 00000000_000e0005 rdi: 00000000_00000316
r8 : 00000000_00000000 r9 : 00000000_00000000
r10: 00000000_00000000 r11: 00000000_00000000
r12: 00000000_00000000 r13: 00000000_00000000
r14: 00000000_00000000 r15: 00000000_00000000
rip: 00000000_00000036
eflags 0x00007046: id vip vif ac vm rf NT IOPL=3 of df if tf sf ZF af PF cf

ah = 0x2 (routine ID), al = 0x1 (number of sectors), ch = 0x0 (lower byte of cylinder number), cl = 0x5 (sector number and high two bits of cylinder no.), dh = 0x0 (head number), dl = 0x0 (drive number).

sreg prints for es:

es:0x0000

and bx = 0x0, so the sector is loaded to 0x0:0x0, just as I intended.


I tried several things:

  1. Load to physical address 0x600

    I thought that maybe overriding the IVT or BDA might not be a good idea during execution of a BIOS interrupt routine, so I tried loading the sector to 0x600 (es = 0x60, bx = 0x0) (I know that the BDA is only 256 bytes in size). Same result.

  2. Load the first sector on the disk

    Maybe reading the fifth sector is somehow out of bounds or whatever? The code that uses int 0x13 to read my second stage works as expected. The int 0x13 in my second stage is similar so I would have expected it to work. As a test, I altered my second stage to read sector 1 and it still didn't work.

  3. Zeroing out the upper part of eax

    I figured maybe there is indeed a bug in the BIOS routine and somehow eax is used and not ax. I tried zeroing out the upper 16-bit part of eax... to no avail.

As I already said before, I already loaded some sectors from disk into memory. The GPRs' content right before the int 0x13 is as follows (obtained using r in the bochs shell):

CPU0:
rax: 00000000_00000203 rcx: 00000000_00090002
rdx: 00000000_00000000 rbx: 00000000_00000000
rsp: 00000000_00007700 rbp: 00000000_00000000
rsi: 00000000_000e7cdd rdi: 00000000_000000e2
r8 : 00000000_00000000 r9 : 00000000_00000000
r10: 00000000_00000000 r11: 00000000_00000000
r12: 00000000_00000000 r13: 00000000_00000000
r14: 00000000_00000000 r15: 00000000_00000000
rip: 00000000_00007c59
eflags 0x00007046: id vip vif ac vm rf NT IOPL=3 of df if tf sf ZF af PF cf

sreg yields es:0x8f60, which is a dynamically computed address right before the EBDA.

Comparing both, I don't see a significant difference that might influence the functioning of the interrupt routine, so the problem cannot be the parameters passed via registers.

Does anybody have other suggestions on what to do?

Community
  • 1
  • 1
cadaniluk
  • 15,027
  • 2
  • 39
  • 67
  • 1
    Have you tried isolating the problem with a [mcve]? Specifically how you set up the segment registers considering that IP is 36h in the first case and 7c59h in the second. – Margaret Bloom Dec 06 '16 at 14:32
  • "so the sector is loaded to 0x0:0x0" - so, err, overwriting the interrupt vector table? – davmac Dec 06 '16 at 14:46
  • @MargaretBloom `cs` is `0x8f60` for the former, `0x0` for the latter. To clear things up: BIOS loads bootloader to physical address `0x7c00`, bootloader loads second-stage bootloader right below the EBDA, whose start value is obtained dynamically, second-stage bootloader loads kernel to physical address `0x0`. I'd do an MCVE to check that loading sectors works in general if **no** attempt worked. However, loading the secondary bootloader works just fine. – cadaniluk Dec 06 '16 at 14:47
  • @davmac I addressed that in my question, Ctrl-F for "0x600." I tried to load someplace else but failed. – cadaniluk Dec 06 '16 at 14:49
  • @Downvoter regardless, I wouldn't expect loading a sector to 0000:0000 using the BIOS to work. There may also be another issue, but I think this is likely a problem. – davmac Dec 06 '16 at 14:51
  • You are most likely overwriting something. Loading a 2nd stage bootloader right below the EBDA may not be a good idea if there is already something else there. Or if you used a wrong size. Try loading the 2nd bootloader at a fixed, safe, address. – Margaret Bloom Dec 06 '16 at 14:55
  • 1
    @Downvoter in any case your complete question is more or less "I could load a sector earlier, now I can't". Since you've eliminated stateless reasons (sector to load, address to load etc) the only logical explanation is that something else has changed. But without a full [MCVE], we can't know. – davmac Dec 06 '16 at 14:57
  • @MargaretBloom I ran [ax=0xe820, int 0x15 (comprehensive memory map)](http://www.ctyme.com/intr/rb-1741.htm) and the first entry was from `0x0` to `0x9fc00`, the latter equals the EBDA start address. The entry type was `0x1` denoting memory available to the OS. I'm loading three sectors as written in my answer to `0x8f600`, way below the EBDA. The area must be free for use. – cadaniluk Dec 06 '16 at 15:03
  • Then I don't know what the issue may be, I'm sorry. I would try commenting out and simplifying as much code as possible and see when and if the problem disappears. The Bochs log says nothing interesting? Even with debug level on? (Things will get *very* slow, be patient) – Margaret Bloom Dec 06 '16 at 15:35
  • @davmac Well, the error could be anywhere, so I frankly don't know where to start with an MCVE. The repo is [on GitHub](https://github.com/qrzbl/kernel32), just do a `make all`, get code addresses using `objdump -d boot.out -mi8086`. bochs is supported, just run `bochs` in the source tree. But I'll better, do as [MargaretBloom said](http://stackoverflow.com/questions/40995474/floppy-read-ah-0x2-int-0x13-takes-like-forever#comment69206877_40995474), comment stuff until the error disappears and report back. Takes the most time but succeeds always. – cadaniluk Dec 06 '16 at 15:41
  • 2
    The problem is likely because you have remapped the IRQs on the 8259 to settings that are not what the BIOS expects. Remap the IRQs after you get in protected mode. At that point you won't be using the BIOS routines anymore and you can do as you please. BIOS probably uses the Floppy Drive IRQ6 (and system timer IRQ0) and it never hits the proper IVT because they were remapped. – Michael Petch Dec 06 '16 at 16:08
  • @Downvoter the idea with an MCVE is that you keep removing stuff until the problem goes away. Then you have either found the problem, or reduced the code to the point that you can't reduce it further, at which stage it is usually fine to post it all in the question. In this case, it sounds like you could have removed the IRQ remapping, and discovered the problem yourself. In other words, "comment stuff until the error disappears" is exactly what you should have done to begin with. – davmac Dec 06 '16 at 16:11
  • @MichaelPetch Dammit, you're right! :O If you write an answer, I'll be glad to accept it. – cadaniluk Dec 06 '16 at 16:54
  • @davmac The code base isn't particularly large, but still large enough that commenting out and checking if the error disappears is tedious. I thought somebody could help me out faster than that. – cadaniluk Dec 06 '16 at 16:56
  • 1
    @Downvoter understood, but when you post a question and people start asking for an MCVE and your response is (a) I'd do one but loading the secondary bootloader worked fine and then (b) I wouldn't know where to start - then there's some kind of misunderstanding about what an MCVE is and/or why it is needed. In general if you haven't fairly exhaustively tried to solve the problem yourself, don't post a question. Yes, someone might guess the solution ("are you re-routing IRQs before calling int 0x13 perhaps?"); that doesn't mean you can/should skip the legwork. – davmac Dec 06 '16 at 17:08

1 Answers1

7

A couple of issues with your Int 13h/AH=02h floppy disk reading code:

  1. This one you already identified in your question. Reading sectors on top of 0x0000:0x0000 is a bad idea when you are running in real mode. That will clobber the interrupt vector table (IVT). The area from 0x0000:0x0000 to 0x0040:0x0000 is the IVT; the area 0x0040:0x0000 to 0x0060:0x0000 is the BIOS Data Area (BDA). The BDA should be considered a scratch area that real mode BIOS routines may use.

    To fix load it somewhere safe like 0x0060:0x0000 (physical address 0x00600).

    Once in protected mode the area between 0x00000000 and 0x00000600 can be reclaimed for other uses. Note : Do not use the Extended BIOS Data Area (EBDA) memory area as general purpose memory because System Management Mode (SMM) and Advanced Configuration and Power Interface (ACPI) may write to it.

  2. Your code remaps the 8259A to get ready for protected mode. In doing so the IRQs are being remapped to different parts of the IVT. Int 13h routines may rely on interrupts to fire and the BIOS interrupt routines to perform work that is needed by floppy disk reads. IRQ0 (system timer) and IRQ6 (floppy controller) are possible. If you remap the base of the 8259As elsewhere the interrupt routines that were installed by the BIOS won't execute. This will likely lead to unexpected behavior including Int 13h never returning.

    To fix the problem, I recommend remapping the base of the 8259A PICs after you are in protected mode. By that time you are likely done with the BIOS interrupts so it shouldn't be an issue.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198