7

gcc does have the ability to use multi-byte NOPs for aligning loops and functions. However when I tried the -fpatchable-function-entry option it always emits single-byte NOPs

You can see in this example that gcc aligns the function with nop DWORD PTR [rax+rax*1+0x0] and nop WORD PTR cs:[rax+rax*1+0x0] but uses eight 0x90 NOPs at the function entry when I specify -fpatchable-function-entry=8,3

I saw this in the document

-fpatchable-function-entry=N[,M]

  • Generate N NOPs right at the beginning of each function, with the function entry point before the Mth NOP. If M is omitted, it defaults to 0 so the function entry points to the address just at the first NOP. The NOP instructions reserve extra space which can be used to patch in any desired instrumentation at run time, provided that the code segment is writable. The amount of space is controllable indirectly via the number of NOPs; the NOP instruction used corresponds to the instruction emitted by the internal GCC back-end interface gen_nop. This behavior is target-specific and may also depend on the architecture variant and/or other compilation options.

It clearly said that N NOPs will be inserted. However I think this should be an N-byte NOP (or whatever optimal number of NOPs to fill the N-byte space). Similarly if M is specified you need to emit an M-byte and an (N − M)-byte NOP

So why does gcc do this? Can we make it generate multi-byte NOPs? And are two 0x90 NOPs better than Microsoft's mov edi, edi?

Community
  • 1
  • 1
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 2
    The multiple byte NOPs are probably generated by the assembler through it's alignment directive. Note that strictly speaking the option is working as documented and expected, the parameter is the number of NOP instructions inserted not the number of bytes of NOP instructions. Changing this behaviour so it worked they way you think it should would be an incompatible change and could break existing applications that use this option. You might want to look at the `ms_hook_prologue` function attribute to see if it does what you want, otherwise I think you'd need to implement this yourself. – Ross Ridge Aug 16 '18 at 16:09
  • 2
    Multiple single-byte NOPs generally suck; each one takes an entry in the uop cache, and a separate slot of front-end issue bandwidth. Along with `push` and other short instructions at the start of a large function, you could end up with too many uops from the first 32 bytes of machine code in a function to fit in 3 lines of up-to-6 uops on Sandybridge-family. – Peter Cordes Aug 17 '18 at 00:16
  • Seems like the current best option is to [use Clang 10](https://godbolt.org/z/5SCpR7). Also, from this [relevant GCC mailing list thread](https://gcc.gnu.org/legacy-ml/gcc/2020-01/msg00020.html), it seems like this is a "lazy" design choice. – Marco Bonelli May 12 '20 at 01:30
  • 1
    In addition to what @PeterCordes said and in support to ms_hook_prologue, mentioned - multibyte NOPs (or neutral instructions like xchg ax,ax or mov edi, edi) enable atomic flow redirection - you can replace it with a backward short jump to a pre-initialized larger thunk (maybe including a moved 2+ bytes leading instruction - indeed, on MSVC /functionpadmin only guarantees that the first instruction is longer than 2 (x86) or 3 (x64) bytes) without the risk of CPU reading in-between [of single-byte instructions]. But the trouble with ms_hook_prologue is it's an attribute, not a global setting.. – Arty Aug 05 '21 at 01:59

0 Answers0