2

I am modelling a custom MOV instruction in the X86 architecture in the gem5 simulator, to test its implementation on the simulator, I need to compile my C code using inline assembly to create a binary file. But since it a custom instruction which has not been implemented in the GCC compiler, the compiler will throw out an error. I know one way is to extend the GCC compiler to accept my custom X86 instruction, but I do not want to do it as it is more time consuming(but will do it afterwards).

As a temporary hack (just to check if my implementation is worth it or not). I want to edit an already MOV instruction while changing its underlying "micro ops" in the simulator so as to trick the GCC to accept my "custom" instruction and compile.

As they are many types of MOV instructions which are available in the x86 architecture. As they are various MOV Instructions in the 86 architecture reference.

Therefore coming to my question, which MOV instruction is the least used and that I can edit its underlying micro-ops. Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers and my instructions mirrors the same implementation of a MOV instruction.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
newww
  • 65
  • 4

1 Answers1

5

Your best bet is regular mov with a prefix that GCC will never emit on its own. i.e. create a new mov encoding that includes a mandatory prefix in front of any other mov. Like how lzcnt is rep bsr.

Or if you're modifying GCC and as, you can add a new mnemonic that just uses otherwise-invalid (in 64-bit mode) single byte opcodes for memory-source, memory-dest, and immediate-source versions of mov. AMD64 freed up several opcodes, including the BCD instructions like AAM, and push/pop most segment registers. (x86-64 can still mov to/from Sregs, but there's just 1 opcode per direction, not 2 per Sreg for push ds/pop ds etc.)

Assuming my workload just includes integers i.e. most probably wont be using the xmm and mmx registers

Bad assumption for XMM: GCC aggressively uses 16-byte movaps / movups instead of copying structs 4 or 8 bytes at a time. It's not at all rare to find vector mov instructions in scalar integer code as part of inline expansion of small known-length memcpy or struct / array init. Also, those mov instructions have at least 2-byte opcodes (SSE1 0F 28 movaps, so a prefix in front of plain mov is the same size as your idea would have been).

However, you're right about MMX regs. I don't think modern GCC will ever emit movq mm0, mm1 or use MMX at all, unless you use MMX intrinsics. Definitely not when targeting 64-bit code.

Also mov to/from control regs (0f 21/23 /r) or debug registers (0f 20/22 /r) are both the mov mnemonic, but gcc will definitely never emit either on its own. Only available with GP register operands as the operand that isn't the debug or control register. So that's technically the answer to your title question, but probably not what you actually want.


GCC doesn't parse its inline asm template string, it just includes it in its asm text output to feed to the assembler after substituting for %number operands. So GCC itself is not an obstacle to emitting arbitrary asm text using inline asm.

And you can use .byte to emit arbitrary machine code.

Perhaps a good option would be to use a 0E byte as a prefix for your special mov encoding that you're going to make GEM decode specially. 0E is push CS in 32-bit mode, invalid in 64-bit mode. GCC will never emit either.

Or just an F2 repne prefix; GCC will never emit repne in front of a mov opcode (where it doesn't apply), only movs. (F3 rep / repe means xrelease when used on a memory-destination instruction so don't use that. https://www.felixcloutier.com/x86/xacquire:xrelease says that F2 repne is the xacquire prefix when used with locked instructions, which doesn't include mov to memory so it will be silently ignored there.)

As usual, prefixes that don't apply have no documented behaviour, but in practice CPUs that don't understand a rep / repne ignore it. Some future CPU might understand it to mean something special, and that's exactly what you're doing with GEM.

Picking .byte 0x0e; instead of repne; might be a better choice if you want to guard against accidentally leaving these prefixes in a build you run on a real CPU. (It will #UD -> SIGILL in 64-bit mode, or usually crash from messing up the stack in 32-bit mode.) But if you do want to be able to run the exact same binary on a real CPU, with the same code alignment and everything, then an ignored REP prefix is ideal.


Using a prefix in front of a standard mov instruction has the advantage of letting the assembler encode the operands for you:

template<class T>
void fancymov(T& dst, T src) {
    // fixme: imm -> mem  needs a size suffix, defeating template
    // unless you use Intel-syntax where the operand includes "dword ptr"
    asm("repne; movl  %1, %0"
#if 1
       : "=m"(dst)
       : "ri" (src)
#else
       : "=g,r"(dst)
       : "ri,rmi" (src)
#endif
       : // no clobbers
    );
}

void test(int *dst, long src) {
    fancymov(*dst, (int)src);
    fancymov(dst[1], 123);
}

(Multi-alternative constraints let the compiler pick either reg/mem destination or reg/mem source. In practice it prefers the register destination even when that will cost it another instruction to do its own store, so that sucks.)

On the Godbolt compiler explorer, for the version that only allows a memory-destination:

test(int*, long):
        repne; movl  %esi, (%rdi)       # F2 E9 37
        repne; movl  $123, 4(%rdi)      # F2 C7 47 04 7B 00 00 00
        ret

If you wanted this to be usable for loads, I think you'd have to make 2 separate versions of the function and use the load version or store version manually, where appropriate, because GCC seems to want to use reg,reg whenever it can.


Or with the version allowing register outputs (or another version that returns the result as a T, see the Godbolt link):

test2(int*, long):
        repne; mov  %esi, %esi
        repne; mov  $123, %eax
        movl    %esi, (%rdi)
        movl    %eax, 4(%rdi)
        ret
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I have been trying to use the prefix method that you have advised but it justs adds complications to the GEM5 in terms of decoding the instruction, if I was suppose to add a new instruction mnemonic in the GCC compiler, how would I add that, suppose as suggested I assign the new mnemonic the opcode of AAM? (I think this might be simpler for me) – newww Mar 23 '20 at 08:56
  • @newww: Sure, you could do that. IDK how to tweak GCC internals to emit it, though. If you want to replace every `mov`, you're going to need at least 3 opcodes for `mov r/m, r` ; `mov r, r/m` and `mov r/m, imm`. Actually make that 6: all of those for both byte vs. word/dword/qword operand-size (selected by `66` or REX prefix). x86 also has no-modrm encodings for mov-immediate to register (byte and dword), where the register number is the low 3 bits of the opcode byte. You could make GCC not use those. – Peter Cordes Mar 23 '20 at 09:15
  • ,can we use a legacy prefix with a one byte opcode, as I was reviewing the instructions that use a legacy prefix where either x87 instructions or they had a two-byte opcode? So is it okay to make a custom instruction using a prefix and a one byte opcode? – newww Mar 25 '20 at 15:13
  • Also If I use a prefix and together with the opcode for the ~mov~ instruction, in GEM5, I can decode it to my custom mov instruction i.e. if it has the ~F2~ prefix followed by the ~88~ or ~89~ opcode, i will be referring to my custom instruction right? – newww Mar 25 '20 at 15:18
  • @newww: look at instructions like [`pause` (`F3 90` = `rep nop`)](https://www.felixcloutier.com/x86/pause) or [`tzcnt` (`rep bsf`)](https://www.felixcloutier.com/x86/tzcnt) where the encoding is a mandatory prefix that changes the meaning of an existing opcode. So yes, x86 machine code already has this kind of shenanigans that make it tricky to decode, using F3 prefixes. No reason you can't use F2 or 0E. The only challenge is getting GEM5's decoder function to decode it as a different instruction than without the prefix. – Peter Cordes Mar 25 '20 at 15:19
  • @newww: Or even worse: VEX prefixes overlap with invalid encodings of LDS and LES in 32-bit mode. That's why some of the bits are inverted: to make sure that no 32-bit mode VEX prefix could be a *valid* LDS or LES. https://en.wikipedia.org/wiki/VEX_prefix#Technical_description. – Peter Cordes Mar 25 '20 at 15:23
  • Hi, i've been trying to do the same thing that is add another special mov instruction with a legacy prefix that is unlikely to be emitted by GCC. Looking at the decoder code, I didn't understand how LEGACY_DECODEVAL is used. The decoding is as follows: (no-prefix) -> 0x0; operand size (0x66) -> 0x1; 0xF3 -> 0x4; 0xF2 -> 0x8; Any idea how this works? I am trying to use one byte opcode MOV not two byte opcode MOV – gPats May 28 '22 at 20:00
  • @gPats: Sorry, I don't know the GCC or GAS internals related to emitting x86 machine code, or Qemu or Bochs source code that decodes it manually. But it sounds like constants that could be used to record a bitmap of which prefixes were seen or are needed before the opcode of a given instruction. – Peter Cordes May 28 '22 at 20:21
  • @Peter Cordes Oh sorry, i should have mentioned that LEGACY_DECODEVAL is Gem5 decoder code. I am using inline assembly right now to generate the instruction I want. I simply can't get my head around the gem5 code [link](https://github.com/gem5/gem5/blob/e4fae58da6c044b6efec62392ff99f343ce67947/src/arch/x86/isa/decoder/two_byte_opcodes.isa#L213) here though. – gPats May 28 '22 at 20:26
  • so an instruction like repne;mov will be one byte with legacy prefix repne. i wish to use that as the new special instruction as gcc is unlikely to emit this (if i'm not wrong) – gPats May 28 '22 at 20:30
  • @gPats: Looks like a data file that defines how to decode the opcode pattern `0F 1x`, with the low 3 bits of the opcode treated separately. (That makes sense because some x86 opcodes use the low 3 bits as a register number, like `push`.) `0x02 << 3` is `0x10`, which is the opcode for `movups` / `movss` / `movupd` depending on prefixes: http://ref.x86asm.net/coder32.html#x0F10. So that's the `0x0: decode OPCODE_OP_BOTTOM3` case. `0F 13` is `movlps` (or `movlpd` with a prefix). It's possible GEM5 just lets a `66` prefix be ignored in cases like `movlps` / `movlpd` behaving identically. – Peter Cordes May 28 '22 at 20:35
  • @gPats: There are lots of different 1-byte opcodes for `mov`. Yeah, you could pick one (or more) and define a meaning for `F2 89 modrm` or whatever if you want. That would be a `decode OPCODE_OP_TOP5` of `0x11`, presumably in a different file since that one's named for `two_byte_opcodes.isa` but you're talking about a one-byte opcode, not `0F xx` 2-byte opcode. I've never looked at GEM5 source code before, but knowing x86 machine code it looks sensible. – Peter Cordes May 28 '22 at 20:39
  • @PeterCordes I see what you're saying here but i still don't get how LEGACY_DECODEVAL is getting decoded. as you can see on the top of file we first start with decoding TOP5 bits of opcode 0F **1x** which leads us to 0x02 case. then we decide to decode the prefix bits (not the opcode) which uses a mapping which i don't understand. Then we move to decoding the bottom 3 bits of opcode. Sorry, I am relatively inexperienced at this. Thanks for your patience. If you want we can move this to chatroom – gPats May 28 '22 at 21:03
  • @gPats: That's a GEM5 internals question, not an x86 question. I've never used GEM5; what I wrote earlier is just what I could figure out from looking at what's in the file. It does look like different prefixes set different bits in the `LEGACY_DECODEVAL` value, so if you wanted to add an instruction that used a different combination of prefixes, you'd maybe add a `0xc:` entry for F2 F3 (or F3 F2). Or if you wanted to add prefix decoding for an opcode that didn't previously do any, you'd add a `0x11 0x0: decode LEGACY_DECODEVAL { 0: original stuff` / `0x8: your stuff` `}`. – Peter Cordes May 28 '22 at 21:10