Is an empty line of code that ends with a semicolon equivelent to an asm("nop") instruction?
No, of course not. You could have trivially tried it yourself. (On your own machine, or on the Godbolt compiler explorer, https://godbolt.org/)
You wouldn't want innocent CPP macros to introduce a NOP if FOO(x);
expanded to just ;
because the appropriate definition for FOO()
in this case was the empty string.
__nop()
is not a library function. It's an intrinsic that does exactly what you want. e.g.
#ifdef USE_NOP
#ifdef _MSC_VER
#include <intrin.h>
#define NOP() __nop() // _emit 0x90
#else
// assume __GNUC__ inline asm
#define NOP() asm("nop") // implicitly volatile
#endif
#else
#define NOP() // no NOPs
#endif
int idx(int *arr, int b) {
NOP();
return arr[b];
}
compiles with Clang7.0 -O3 for x86-64 Linux to this asm
idx(int*, int):
nop
movsxd rax, esi # sign extend b
mov eax, dword ptr [rdi + 4*rax]
ret
compiles with 32-bit x86 MSVC 19.16 -O2 -Gv to this asm
int idx(int *,int) PROC ; idx, COMDAT
npad 1 ; pad with a 1 byte NOP
mov eax, DWORD PTR [ecx+edx*4] ; __vectorcall arg regs
ret 0
and compiles with x64 MSVC 19.16 -O2 -Gv to this asm (Godbolt for all of them):
int idx(int *,int) PROC ; idx, COMDAT
movsxd rax, edx
npad 1 ; pad with a 1 byte NOP
mov eax, DWORD PTR [rcx+rax*4] ; x64 __vectorcall arg regs
ret 0
Interestingly, the sign-extension of b
to 64-bit is done before the NOP. Apparently x64 MSVC requires (by default) that functions start with at least a 2-byte or longer instruction (after the prologue of 1-byte push
instructions, maybe?), so they support hot-patching with a jmp rel8
.
If you use this in a 1-instruction function, you get an npad 2
(2 byte NOP) before the npad 1
from x64 MSVC:
int bar(int a, int b) {
__nop();
return a+b;
}
;; x64 MSVC 19.16
int bar(int,int) PROC ; bar, COMDAT
npad 2
npad 1
lea eax, DWORD PTR [rcx+rdx]
ret 0
I'm not sure how aggressively MSVC will reorder the NOP with respect to pure register instructions, but a^=b;
after the __nop()
will actually result in xor ecx, edx
before the NOP instruction.
But wrt. memory access, MSVC decides not to reorder anything to fill that 2-byte slot in this case.
int sink;
int foo(int a, int b) {
__nop();
sink = 1;
//a^=b;
return a+b;
}
;; MSVC 19.16 -O2
int foo(int,int) PROC ; foo, COMDAT
npad 2
npad 1
lea eax, DWORD PTR [rcx+rdx]
mov DWORD PTR int sink, 1 ; sink
ret 0
It does the LEA first, but doesn't move it before the __nop()
; seems like an obvious missed optimization, but then again if you're inserting __nop()
instructions then optimization is clearly not the priority.
If you compiled to a .obj
or .exe
and disassembled, you'd see a plain 0x90 nop
. But Godbolt doesn't support that for MSVC, only Linux compilers, unfortunately, so all I can do easily is copy the asm text output.
And as you'd expect, with the __nop()
ifdefed out, the functions compile normally, to the same code but with no npad
directive.
The nop
instruction will run as many times as the NOP() macro does in the C abstract machine. Ordering wrt. surrounding non-volatile
memory accesses is not guaranteed by the optimizer, or wrt. calculations in registers.
If you want it to be a compile-time memory reordering barrier, for GNU C use asm("nop" ::: "memory");`. For MSVC, that would have to be separate, I assume.