INC opcode compiles to wrong address

Question

I'm compiling the following code however it isn't working as expected.

Can someone explain why the following code doesn't work and how to correct it so it does?

DWORD data_location = 0x0100579C;
DWORD ret = 0x1002FFA;

void __declspec(naked) inc()
{
    // The following is what I'm trying to accomplish which works
    *(DWORD*)data_location = *(DWORD*)data_location + 1;

    __asm
    {   
        inc [data_location] //Should compile as FF 05 9C570001, instead compiles to the address containing the pointer to data_location
        // inc data_location also compiles to the same thing above

        jmp [ret]
    }
}

Rather then define a variable for `data_location` can you use a constant like: `#define data_location 0x0100579C` and then in the `__asm` statement do `inc dword ptr ds:[data_location]` — Michael Petch, Jul 18 '19 at 15:04
If you create a compile time constant for `ret` instead of a variable you could also achieve the absolute JMP with `#define ret 0x1002FFA` and then in the `__asm` statement do `mov eax, ret` `jmp eax` which would do absolute indirect through a register. — Michael Petch, Jul 18 '19 at 15:08
My first comment is based on the observation your inline comment suggests you wanted the behaviour `//Should compile as FF 05 9C570001` — Michael Petch, Jul 18 '19 at 15:10
@MichaelPetch: that's funny, I just had exactly the same thought and edited my answer with the same `#define ret` as you were suggesting, before reading your comment. — Peter Cordes, Jul 18 '19 at 17:52

500 - Internal Server Error · Answer 1 · 2019-07-18T13:56:10.047

2

If I'm understanding you correctly, you want something along the lines of

DWORD data_location = 0x0100579C;
DWORD ret = 0x1002FFA;

void __declspec(naked) inc()
{
    __asm
    {   
        mov eax, [data_location]
        inc dword ptr [eax]

        jmp [ret]
    }
}

edited Jul 18 '19 at 13:56

answered Jul 18 '19 at 12:02

500 - Internal Server Error

28,327
8
59
66

1

You need `inc dword ptr [eax]` to avoid ambiguity. For efficiency I'd recommend `add dword ptr [eax], 1` - it's 1 fewer uop on Intel CPUs. This can keep using `jmp [ret]`; memory indirect is fine there. Only the increment needs an extra load for the extra level of indirection. – Peter Cordes Jul 18 '19 at 12:50

Peter Cordes · Accepted Answer · 2019-07-18T18:26:13.133

2

[data_location] is the same thing as data_location in MASM syntax. Square brackets are optional, not the extra level of indirection you need to deref a pointer from static storage.

Remember that in C, data_location gives you the value from memory, and your C is then dereferencing that. But inline asm uses asm syntax.

If you want it to assemble with the address hard-coded into the instruction, you need to make the address a preprocessor constant, not just a DWORD variable in static storage.

#define data_location  0x0100579C
#define ret_addr  0x1002FFA

void __declspec(naked) inc()
{
    //++*(DWORD*)data_location;
    //((void (*)(void))ret)();

    __asm
    {   
        add  dword ptr ds:[data_location], 1
         // add dword ptr ds:[0x0100579C], 1   // after C preprocessor

        mov  eax, ret_addr
        jmp  eax
    }
}

Apparently a ds: is necessary to make MASM/MSVC treat [0x12345] as a memory operand, not an immediate. But it also has the downside of actually emit a redundant ds prefix byte in the machine code.

Obviously you could make this much more efficient by actually using
++*(DWORD*)data_location; and letting the compiler inline the add or inc instruction. Forcing a caller to actually call this stub function will just slow you down.

add [mem], immediate is only 2 uops, vs. 3 for memory-destination inc on Intel CPUs. It only costs 1 extra byte of code-size.

jmp [ret] with DWORD ret = ...; will work, but is an unfortunate choice. You don't really need to load the target address from static storage. Ideally you'd jmp 0x1002FFA and let the assembler calculate a relative offset to that absolute destination. But unfortunately MASM syntax and/or Windows .obj files don't support that.

If you can use a tmp register, mov-immediate of the address into the register avoids needing any static data, potentially allowing the front-end to sort out a branch mispredict sooner. It's still an indirect branch, though.

Also, if you ever actually call this function, remember that the caller will have pushed a return address which you leave on the stack, so this is like a tailcall.

In fact, you could get the compiler to emit a jmp for you if you simply made a normal function call with no args at the end of a void function.

edited Jul 18 '19 at 18:26

answered Jul 18 '19 at 12:52

Peter Cordes

328,167
45
605
847

2

A peculiarity with MASM is that `add dword ptr [data_location], 1` won't be accepted. If you have just an absolute value for the address you have to qualify the memory operand with a segment (ie. DS). What should work is `add dword ptr ds:[data_location], 1` . Note: the segment should be outside the brackets (not inside) – Michael Petch Jul 18 '19 at 17:58
`add dword ptr [data_location], 1` will produce the error `error C2415: improper operand type` – Michael Petch Jul 18 '19 at 17:59
@MichaelPetch: Thanks. I guess this is the same syntax rule as Ross mentions in [Confusing brackets in MASM32](//stackoverflow.com/q/25129743) where `mov eax, [const]` is a mov-immediate!! (for `const = 42` or something). I don't like MASM syntax at all; too much weirdness and magic for my taste. – Peter Cordes Jul 18 '19 at 18:03
2

Yes, and one other peculiarity is that when you do use `add dword ptr ds:[data_location], 1` MASM is going to emit the DS segment override. So the encoding is actually `3E 83 05 9C 57 00 01 01` . The only way to avoid that to my knowledge in inline assembly is to use the `__emit` keyword inside the `__asm` block to emit the instruction manually lol: `__emit 83h` `__emit 05h` `__emit data_location & 0ffh` `__emit (data_location >> 8) & 0ffh` `__emit (data_location >> 16) & 0ffh` `__emit data_location >> 24` `__emit 1` – Michael Petch Jul 18 '19 at 18:16
Consider that a fun fact, but not really useful. `__emit` has downsides because if you emit instructions this way MSVC doesn't parse the data emitted to see what registers might be modified and can't automatically keep track of what registers need to be saved and restored. Doesn't apply here since no register is clobbered. MSVC doesn't know if what is emitted is instructions or data so it leaves it up to the programmer to handle it. – Michael Petch Jul 18 '19 at 18:21
1

@MichaelPetch: interesting. I don't think a redundant DS prefix has a performance downside other than code-size, but that's still terrible that there's no MSVC or MASM syntax for the shortest encoding of a `[disp32]` addressing mode with a numeric constnat. – Peter Cordes Jul 18 '19 at 18:24
1

@Petercodes : I agree, but I more or less pointed it out in the event the OP tries this idea and wonders why there is an extra byte to his instruction that he might find unexpected. – Michael Petch Jul 18 '19 at 21:12
@Petercodes : Thank you both for your detailed responses. Is there a more compact way to __emit rather than one byte at a time if the machine code is known? In your response __emit 0ffh require an extra 0? I've noticed without it (ffh) the compiler errors with C2415, where 0xFF does not, why is this? Why does jmp [ret] work with static storage? – a_dizzle Jul 19 '19 at 08:20
@MichaelPetch: the OP wants to know about `__emit` of more than 1 byte at a time. I have no idea. (See prev. comment.) – Peter Cordes Jul 19 '19 at 12:49
@a_dizzle :there is no way to emit more than one byte at a time. Each emit specifies one byte. There is no emit word or dword either. – Michael Petch Jul 19 '19 at 12:51
@a_dizzle: `jmp [ret]` works with a (function/code) pointer in static storage because jumps are like a load into EIP. Dereferencing the code pointer doesn't logically happen until instruction-fetch after the `jmp`. (Of course in reality there's branch prediction that doesn't have to wait for the branch to actually complete, but the ISA model is that jump are like a `mov eip, src`, so it's fine for src to be memory.) re: emit: I pinged Michael Petch for you. – Peter Cordes Jul 19 '19 at 12:54
@a_dizzle : When using the suffix `h` for creating a hexadecimal value and the first digit is between A and F, you must prefix the number with a 0 for it to be recognized as a number. The reason for this is so the parser can distinguish between a number and an identifier. If the first digit is a value between 0 and 9 it is clear it is a number. But if it starts with A to F the parser assumes it is an identifier and not a number. The 0 in front allows the parser know that what it is dealing with is in fact the hex value `ffh` and not the identifier named `ffh` . – Michael Petch Jul 19 '19 at 12:58
@MichaelPetch: if you're using `__emit` anyway, you could maybe put a label inside your inline asm and use that as a reference to manually encode a `jmp rel32`? (I'm imagining something like `extern char callsite[]` to associate the address with a C var, then doing `static const uint32_t rel32 = 0x1002FFA - (callsite + 5)`) – Peter Cordes Jul 19 '19 at 12:58
@PeterCordes : Don't think it will work because AND (`&`), OR (`|`) etc need to operate on constant expressions and I believe the compiler/internal assembler will balk at trying to do an operation on expressions that it thinks are not constant (or absolute). – Michael Petch Jul 19 '19 at 13:18
@MichaelPetch: oh right, that would need 4 separate byte relocations in the `.obj` for shifted parts of the address. GCC for an ELF target couldn't do that either. (But it could do a whole absolute address wrt. current position by assembling `asm goto("jmp 0x1002FFA" :::);` or something.) To build a `call rel32`, you'd have to put bytes into an executable buffer at runtime. (e.g. VirtualProtect the page containing a static array that had alignment to make sure it didn't cross a page boundary.) – Peter Cordes Jul 19 '19 at 13:31

INC opcode compiles to wrong address

2 Answers2