Why does + act differently for constants vs the rest of data types in x64 assembly language?

Question

If I have a quad word named "a", this instruction: MOV RAX, a + 3 will offset the address of a by 3.

When i do the same thing with a constant (a EQU 10), MOV RAX, a + 3 will move into rax in binary the number 13.

Why does the cpu do that if it is the same "+" sign and the same MOV instruction ?

I wrote this code and while debugging I found out this strange thing. I just can t understand why and I can't find the answer online.

BTW I'm using MASM

But you might want to interpret it as an address, e.g. `vgamem EQU 0b8000h`. I believe masm requires the brackets then (or maybe a `qword ptr`). — Jester, Aug 16 '23 at 20:27

Peter Cordes · Answer 1 · 2023-08-17T02:48:55.177

The CPU runs machine code, not asm source.

In MASM the same instruction syntax means different things in depending on whether it's a "variable" (symbol defined by a label) or an assemble-time constant (equ or a = 10).

Other assemblers (like NASM) don't have that inconsistency, so mov rax, a+3 would always be a mov rax, imm64 of an immediate constant, whether that's an address (link-time constant) or an assemble-time equ constant. (In MASM, that would be mov rax, OFFSET a+3).

(NASM would actually optimize it to mov eax, 13, not 10-byte mov r64,imm4, but still a mov-immediate either way.)

The inconsistency isn't in +, it always does addition.
When a is a symbol / label attached to a dq, a+3 is that symbol address plus three.
When a is an assemble-time constant defined with equ or a = 10, a+3 is 10+3.

A symbol's "value" is its address, like if you declared it in C as extern char a[]. In MASM, if you did dq a+3, it would basically work the same whether it's a label or equ constant: add 3 to a link-time constant (address) or assemble-time constant (equ), and assemble those 8 bytes into the output file at the current position.

The inconsistency is in how operands to asm instructions work in MASM: see Confusing brackets in MASM32

with or without square brackets, mov rax, [a] or [a+3] is a load from the address given by the expression a or a+3, if a is a label.
But if a was an assemble-time constant, the instruction is a mov-immediate.

If you wanted to add to data being loaded from memory, like C uint64_t tmp = a + 3; where a is a global variable, you'd have to mov rax, [a] / add rax, 3.

But if a is a compile-time constant like C++ static const uint64_t a = 10; or #define a 10, then the compiler can do the addition at compile time, like mov eax, 10+3 which assembles the same as mov eax, 13.

The only x86 instruction that can load something from memory and add to the load result is add reg, [mem]. Like mov eax, 3 / add rax, [a]. (But that's less efficient than load+add reg,imm: larger code size and more back-end uops.)

(Intel's proposed APX (Advanced Performance Extensions) will introduce EVEX encodings of integer instructions, making add rax, [a], 3 possible, with a register destination separate from the two sources.)

Any normal memory operand can use an addressing mode which can involve some address math, but address math is separate from math on data operands. Just like you can't do mov rax, rcx + 3, you need an add instruction to do math on values. Or an LEA to copy-and-add.) A different part of the CPU (the load/store execution units) handle address math like [rdx + rcx*8 + 3], and it gets encoded differently in the machine code.

Perhaps that's what you're thinking of as an inconsistency, if you're thinking of a dq 10 as giving a the value 10 the same way a equ 10 does. It doesn't, it puts those bytes in data memory. That's similar but different; the 10 isn't an assemble-time constant so you can use its value in expressions, and it's only accessible with load/store instructions.

BTW, mov rax, a+3 doesn't involve any run-time address math, at least not for the +3. The linker resolves a+3 to a RIP-relative addressing mode just like with a, but offset by 3. So for example it might be [rip + 1013h] for a vs. [rip + 1016h] for a+3.

PS: I mentioned NASM a few times as a point of comparison. See also:

https://www.nasm.us/xdoc/2.11.08/html/nasmdoc2.html#section-2.2.2
Why in NASM do we have to use square brackets ([ ]) to MOV to memory location?
What's the difference between equ and db in NASM?
how to get address of variable and dereference it in nasm x86 assembly? - note that MASM does use the term "variable" for things like a dq 10 in a .data section, and it actually implies an operand-size when you use it like add a, 1. Other assemblers don't have that high-level concept, they just have labels/symbols you can put before or after data, which you can use to implement static variables.

Why does + act differently for constants vs the rest of data types in x64 assembly language?

1 Answers1