0

I have some x86 realmode assembly code which isn't behaving exactly as expected. I believe the issue relates to an incorrectly calculated jmp/call offset, but I might be mistaken.

Here is the assembler language code:

[org 0x7c00]

mov ah, 0x0e

mov al, 'h'
int 0x10

mov al, 'e'
int 0x10

mov al, 'l'
int 0x10

mov al, 'l'
int 0x10

mov al, 'o'
int 0x10

mov al, '!'
;int 0x10
call print_char

;loop:
;    jmp loop

mov si, mystring
call print_string

jmp $


; fill to 512 bytes
times 510 - ($ - $$) db 0
dw 0xAA55

; the address is stored in si
print_string:
    pusha
    ; load character from si
    mov al, [si]
    cmp al, 0x00
    jz print_string_end
    call print_char ; print the char using the print_char function
    inc si ; increment the string printing index si
print_string_end:
    popa
    ret

; print function: print a single character
; the character is stored in al
print_char:
    pusha
    mov ah, 0x0e
    int 0x16
    popa            ; don't know what registers int 0x16 modifies
    ret

mystring:
db "loading operating system",0x00

And here is the dissassembly: objdump -D -b binary -m i8086 -M intel bootsector.bin

bootsector.bin:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   b4 0e                   mov    ah,0xe
   2:   b0 68                   mov    al,0x68
   4:   cd 10                   int    0x10
   6:   b0 65                   mov    al,0x65
   8:   cd 10                   int    0x10
   a:   b0 6c                   mov    al,0x6c
   c:   cd 10                   int    0x10
   e:   b0 6c                   mov    al,0x6c
  10:   cd 10                   int    0x10
  12:   b0 6f                   mov    al,0x6f
  14:   cd 10                   int    0x10
  16:   b0 21                   mov    al,0x21
  18:   e8 f2 01                call   0x20d
  1b:   be 14 7e                mov    si,0x7e14
  1e:   e8 df 01                call   0x200
  21:   eb fe                   jmp    0x21
    ...
 1fb:   00 00                   add    BYTE PTR [bx+si],al
 1fd:   00 55 aa                add    BYTE PTR [di-0x56],dl
 200:   60                      pusha  
 201:   8a 04                   mov    al,BYTE PTR [si]
 203:   3c 00                   cmp    al,0x0
 205:   74 04                   je     0x20b
 207:   e8 03 00                call   0x20d
 20a:   46                      inc    si
 20b:   61                      popa   
 20c:   c3                      ret    
 20d:   60                      pusha  
 20e:   b4 0e                   mov    ah,0xe
 210:   cd 16                   int    0x16
 212:   61                      popa   
 213:   c3                      ret    
 214:   6c                      ins    BYTE PTR es:[di],dx
 215:   6f                      outs   dx,WORD PTR ds:[si]
 216:   61                      popa   
 217:   64 69 6e 67 20 6f       imul   bp,WORD PTR fs:[bp+0x67],0x6f20
 21d:   70 65                   jo     0x284
 21f:   72 61                   jb     0x282
 221:   74 69                   je     0x28c
 223:   6e                      outs   dx,BYTE PTR ds:[si]
 224:   67 20 73 79             and    BYTE PTR [ebx+0x79],dh
 228:   73 74                   jae    0x29e
 22a:   65 6d                   gs ins WORD PTR es:[di],dx
    ...

The file was assembled with nasm bootsector.asm -f bin -o bootsector.bin

On line 1e there is the instruction call 0x200. Unless I misunderstand, this pushes the current (instruction pointer + 1) onto the stack, and jumps to execute code at offset 0x200. This is somewhere in memory below where the origin is, which is 0x7c00, so it appears to be an address different to that of where the function print_char resides.

At least I think that is what is happening, but I might be completely wrong as I'm new to this.

Also - maybe I'm not alowed to have a file which exceeds 512 bytes as a boot sector?

FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225
  • 2
    "Also - maybe I'm not alowed to have a file which exceeds 512 bytes as a boot sector?" Correct, that's what "sector" means: the BIOS loads the first sector from the disk into memory, and a sector is 512 bytes. Stuff beyond offset 512 in your file will end up in the second or later sector, and so it won't be in memory for you to execute or access. You would have to load it into memory yourself with appropriate `int 0x13` calls (that are themselves executed from the first sector). – Nate Eldredge Sep 05 '21 at 16:30
  • @NateEldredge Ok I guess that's the issue then - I was emulating this with QEMU, if that makes any difference (it probably doesn't?) – FreelanceConsultant Sep 05 '21 at 16:32
  • @NateEldredge Although this does raise the question: Where does the offset `0x200` come from? – FreelanceConsultant Sep 05 '21 at 16:32
  • 1
    However, the `0x200` is an artifact of `objdump` not knowing that you intend to load the code at address `0x7c00`; it assumes `0` as the base address. So in fact, assuming a BIOS that sets CS to 0, this instruction will branch to offset `0x7e00`. But as mentioned above, your code won't be there unless you take some action to load it. – Nate Eldredge Sep 05 '21 at 16:32
  • 1
    The `call` instruction itself contains a displacement, so what it actually does is branch to the address of the next instruction plus `0x1df`. The assembler computed that number `0x1df` as the difference between the address of the instruction following `call print_string`, and the address of the label `print_string`, based on the instructions and data that you coded to go in between. `objdump` uses its guess about the address of the `call` instruction to try to tell you the absolute address of the destination, but in this case it is wrong. – Nate Eldredge Sep 05 '21 at 16:34
  • 2
    No, the fact you're using QEMU is irrelevant; in this case QEMU emulates the exact same behavior that a real machine and BIOS would have. – Nate Eldredge Sep 05 '21 at 16:36
  • @NateEldredge After moving those functions before the 0xAA55, I still don't get something that functions correctly, but I guess I should ask a seperate question about that, as it might be a different issue? – FreelanceConsultant Sep 05 '21 at 16:39
  • 1
    Yeah, it would be a separate issue, so ask a separate question. One thing I did notice is that you haven't initialized `bx` before your `int 0x10` calls, so the character may be printed to a video page that is not currently visible, and it also may be an unpredictable color. See http://www.ctyme.com/intr/rb-0106.htm. Also, the `int 0x16` in `print_char` looks like a mistake and should probably be `int 0x10` again. – Nate Eldredge Sep 05 '21 at 16:43
  • A piece of general advice is that Bochs has a better real-mode debugger than QEMU, so you might like to use it so that you can more easily single-step your code and figure out where it is going wrong. – Nate Eldredge Sep 05 '21 at 16:46
  • @NateEldredge To be perfectly honest I've not learned about how to operate the debugger yet - it is something I should do. I think you're right about the `0x10`, and I can guess why `bx` needs to be initialized, I assume it stores properties about how the character is rendered. Will read the linked webpage shortly. – FreelanceConsultant Sep 05 '21 at 16:47
  • 1
    Yes, getting a debugger working, and learning to use it, are task #1 in any assembly development project. It's not hard, but you will waste incredible amounts of time if you don't. It would be like starting a carpentry project without having a ruler or tape measure. – Nate Eldredge Sep 05 '21 at 16:49

2 Answers2

1

Your code exceeds the 512 bytes, that part isn't loaded into RAM so it jumps in reality to an uninitialized memory address. You have to either load the next sector (before the jump/call) or you make it like this:

; maybe you should setup the stack some where here at the start
; ...

; ...
call func
; ...

; your hang instruction
jmp $


; the code below won't be reached except when you call it
; also you use ret so it returns. It will only be executed if you
; explicitly call it or jump to it (for jump returns don't work)
; also this part is before 0xaa55 so it is loaded in your memory.
func:
   ; ... stuff
   ret

; the padding
times 510 - ($ - $$) db 0
dw 0xaa55
0

(It may help you to remember that Intel assembly code can use more than one opcode for one assembly mnemonic. So there are several different versions of "call" to watch out for)

The "call 0x200" disassembled at offset +1e is coded as e8 df 01, which the CPU will execute as a relative call to the next instruction +01df.

Because disassembly defaulted to starting at offset +0 that disassembles as 21+1df (=0200), or 512 in decimal. Remember that print_string was assembled immediately after your 512-byte boot sector, so that adds up.

If your code is loaded at 0000:7c00 then the relative call would go to 0000:7e00, which is calculated correctly, but as others have said that code will not be there, because it wouldn't be loaded by BIOS, which only loads the first sector.

I've done a boot sector a while ago and my advice is a) it's easy to run out of space, so use compact code b) don't assume anything other than CS:IP points to your code. If you rely on DS, ES, SS you may find they are set differently by different BIOS and emulators, so try "mov ax,cs" and "mov ds,ax" etc near the top, to be safe.

Your code uses "mov al,[si]" to load string data. si is paired with ds by default, so it loads from ds:[si]. So perhaps your unexpected output is because ds:[si] points to the wrong data. If you get Bochs set up, you'll be able to find out.

  • Don't depend on `cs` = 0 as an input. It can be = 7C0h. Instead use `xor ax, ax` to zero a register then `mov ds, ax` – ecm Sep 12 '21 at 22:16
  • I agree we can't rely on CS:IP always being 0000:7c00 but in this case I was saying the code wants CS and DS to be the same, to be able to use lodsb or "mov al,[si]". – rupertreynolds Sep 12 '21 at 22:24
  • 1
    @rupertreynolds: But the code is passing an *absolute* address with `mov si, mystring`. That offset will be calculated relative to `org 0x7c00`, not IP, because this isn't position-independent code (which is a pain to do before x86-64, unless you avoid static storage). ECM is correct that this code relies on DS=0, not DS=CS. – Peter Cordes Sep 13 '21 at 00:48
  • I can see that could work, although my last boot sector had to cope with a weird BIOS that loaded at a completely different address (not 0x07c00 linear), so I had to write completely relocatable code for the first sector, which was a headache. – rupertreynolds Sep 13 '21 at 13:02