Partial register access in 64-bit mode
In 64-bit mode, the following rules apply when accessing registers with less than 64-bit:
- If a 32-bit register is accessed, the upper 32 bits of the associated 64-bit register are cleared
- If a 16- or 8-bit register is accessed, the upper 48 or 56 bits of the associated 64-bit register remain.
If only an 8-bit register is accessed, the old value of the associated 64-bit register must first be obtained, the 8-bit sub-register changed and then the new value saved.
Example 6.13 from Agner Fog's microarchitecture manual is not related to this, it is only an alternative to movzx
, because this instruction is slow on older pentium processors.
mov
or or
?
The two lines
31 C0 xor eax, eax
8A 05 ## ## ## ## mov al, byte [mem8]
(opcodes on the left) are probably faster than if you replaced the second line with
0A 05 ## ## ## ## or al, byte [mem8]
since there is a depency to the previous line: Only when xor eax, eax
has been calculated the new value in eax
can be passed on to or
. In addition, just as with the variant with mov
, there may be a slowdown because only a partial register is accessed. Instead, I would suggest replacing these two lines with
0F B6 05 ## ## ## ## movzx eax, byte [mem8]
This is one byte shorter than the previous approach and also just a single instruction that accesses a full 32-bit register. As Agner Fog said
The easiest way to avoid partial register stalls is to always use full registers and use MOVZX
or MOVSX
when reading from smaller memory operands.