3
data segment
    aa db 22h, 22h, 22h
    len = $-aa    
ends

stack segment
    dw   128  dup(0)
ends

code segment
start:
    mov ax, data
    mov ds, ax
    mov es, ax

    mov si, offset aa

    mov [si+len],al
    mov al,[si+len]   


    mov ax, 4c00h ; exit to operating system.
    int 21h    
ends

It works like this:

MOV [SI]+03H, AL 
MOV AL, 03H (This is wrong)

when len in source address, the source address will be compiled as the value of len(3)

but when I change the len in data segment:

len = 3

It will work correctly:

MOV [SI]+03H, AL
MOV AL, [SI]+03H

why?

REsky
  • 31
  • 2
  • 4
    Looks like an assembler bug -- what assembler are you using? – Chris Dodd Jun 02 '20 at 18:50
  • 2
    Hmm. Let's think about this for a second. aa only has 3 elements, so aa + 3 isn't one of the elements of aa, it's the first value *after* aa, which is `len`, which is 3. – David Wohlferd Jun 02 '20 at 23:24
  • 1
    @DavidWohlferd: yes, the code is buggy even if it did assemble correctly, but EMU8086 is apparently also buggy. There's no way `mov al,[si+len]` should ever assemble to `MOV AL, 03H`. Possibly you could work around the assembler bug with `byte ptr [si + len]` or something, if you're stuck using emu8086 instead of a better assembler. – Peter Cordes Jun 03 '20 at 00:22
  • @PeterCordes Unless the value of [si+len] is known at compile time (which it is in this case). I've never seen an assembler perform optimizations. Doesn't mean it can't happen. – David Wohlferd Jun 03 '20 at 00:45
  • @DavidWohlferd: that would make it a compiler / binary to binary optimizer, not just an assembler. There are apparently such things, but we generally don't call them assemblers. Even MIPS assemblers that aggressively reorder instructions to fill branch delay slots didn't AFAIK change, combine, or remove single instructions, *just* reorder. I'm almost 100% certain EMU8086 isn't doing that on purpose, and this optimization looks buggy. That instruction is reloading the value of AL it just stored, which is the `data` segment value. Unless that happened to be `xx03h`, it's the wrong value. – Peter Cordes Jun 03 '20 at 00:51
  • 1
    @Chris Dodd: I'm using 'emu8086 - assembler and microprocessor emulator v4.05'. – REsky Jun 03 '20 at 02:31
  • @DavidWohlferd: Though I append ```bb db 22h, 22h, 22h``` at the end of data segment, the bug still exists. – REsky Jun 03 '20 at 02:36
  • @PeterCordes: Thanks but ```byte ptr [si+len]``` also doesn't work – REsky Jun 03 '20 at 02:39
  • 1
    That was just a wild guess. Having a register as part of the addressing mode should already have ruled out an immediate source. Use a better, non-buggy, assembler like MASM if possible. (I prefer NASM syntax, but MASM and TASM are syntax-compatible with emu8086.) – Peter Cordes Jun 03 '20 at 02:51
  • 1
    I'd be more interested in what happens if you do `mov al,[si+len - 1]` or `mov al,[si]`, since that wouldn't be trying to do a `mov` from the contents of `len` anymore (which seems like undefined behavior, since len really shouldn't be allocating any memory). Or how about doing `len equ $-aa` instead of `=`? Maybe move len outside the data segment and define it as `len equ bb-aa`? – David Wohlferd Jun 03 '20 at 03:55
  • 1
    I think `len = $-aa` (or `len equ $-aa`) is defining a *constant* value that will be "hardcoded" into `MOV AL, 03H`. Since you're trying to do a memory addresses pointer arithmetic (`$` and `aa` are memory addresses), maybe you should declare it as `len db $-aa` (just a guess) – Gomiero Jun 03 '20 at 04:20
  • @DavidWohlfred: It works when I move ```len``` in the code segment ! ```mov al, [si+len-1]``` will be ```mov al 02h```. ```mov al, [si]``` works because it doesn't contain ```len```. ```equ``` is the same as ```=``` – REsky Jun 03 '20 at 06:36
  • @Gomiero: Yes, ```len db $-aa``` works. But how it "hardcode" an address into a constant value? – REsky Jun 03 '20 at 06:44

0 Answers0