But isn't there an efficient way to store a table for the mnemonics without duplicates?
This has become an algorithms and data structures question.
As you point out, many of the opcode table entries (at least for the table without a 0f
escape byte: http://sparksandflames.com/files/x86InstructionChart.html) do repeat in groups of 4 or 2, i.e. with the same 6 or 7-bit prefix selecting the same mnemonic.
Obviously a 256-entry table of structs is simple, but duplicates things. It's very fast and easy to use, since it's probably still small enough not to cache-miss very often. (Especially since the common entries will stay hot in cache; x86 code uses the same opcodes a lot.)
You can trade simplicity / performance for space.
You could have a 64-entry table of structs where one member is a pointer to a secondary table to be indexed with the low 2 bits. If the pointer is NULL, it means the instruction follows the pattern of add
/ and
/ xor
/ etc. where the low 2 bits tell you 8 bit vs. whatever the operand-size is and direction (r/m,reg or reg,r/m).
Your struct would also need entries for turning into other instructions when certain prefixes are present (e.g. rep nop
is pause
). Also, AVX VEX prefixes use what used to be an invalid encoding of another instruction. x86 is pretty crazy to decode if you want to do a complete job for all the current extensions.
Actually, it might be simplest (and also efficient) to just use a table of function pointers. Or a struct with a const char* mnemonic
and a int (*decode)(const char*mnemonic, const char *insn_bytes, unsigned prefix_bitmap)
function, so lots of opcodes can point to the same decode-function but still get different mnemonics. Sometimes the function will ignore the passed mnemonic, but other times that's all it needs. You'd have a common function for decoding addressing modes that many of the decode functions would call.
This is fairly similar to how you might implement an x86 emulator that interprets, instead of doing dynamic recompilation. A common decode loop and then dispatching through function pointers.
An even more complicated data structure you might use is a radix trie aka prefix tree. See also https://en.wikipedia.org/wiki/Trie#Bitwise_tries.
This is getting into silly season, because the density is so high that a lookup table makes much more sense. (There are very few undefined opcode).