2

I'd like to assembly an x86 file while ensuring that the code will run on a given processor, without having to test it on a processor emulator.

Is there a tool/technique which would allow me to do some sort of x86 instruction classification according to the oldest required processor that supports it, or at least to warn me if incompatible instructions are being used?

In short, I'm looking for an automated version of this Wikipedia table of x86 instruction listings, to help me check if a given code should be compatible with a given processor.

anol
  • 8,264
  • 3
  • 34
  • 78
  • 6
    Instructions present in the assembly/binary code are not necessarily executed. Most programs that take advantage of modern instructions also contain a backwards-compatible version of the optimized functions and execute on or the other depending on the processor detected at run-time. – Pascal Cuoq Jul 20 '15 at 07:04
  • In this case, a conservative estimation taking into account *every instruction in the file* would be good enough for me. I'm having code running on old processors spewing errors such as "iilegal opcode/instruction" and resetting, and I'd like to be able to understand why without having to manually check each instruction against a table, or having to run it through a series of processor emulators to see which one fails. – anol Jul 20 '15 at 07:14
  • 2
    Definitive classification of instructions of binary code (executable or object code) is impossible, because it would mean solving [**the halting problem**](https://en.wikipedia.org/wiki/Halting_problem) that is proved **undecidable**. It is not even possible to definitively distinguish code and data, for the same reason (the halting problem). Of course there are heuristics and simpler methods that may work or not, such as linear sweep method used by some disassemblers. – nrz Jul 20 '15 at 07:23
  • Updated the question to consider only assembly file. I'm not interested in an exact classification, but an over-approximation, which is decidable. Apparently there is not such a tool, which probably means it would not be as useful as I believe it would, but it would still be possible to make such a tool, considering only `.text` sections and so. @PascalCuoq's comment about run-time dependent code execution could be stated as a negative answer to my question, for instance. – anol Jul 20 '15 at 07:28
  • Alas I'm not aware of any tool that classify x86 instructions. Anyway I strongly believe that such classification can be done and has nothing to do with the halting problem nor it is anything esoteric. IDA pro (or any good disassembler) could do it for example. Maybe it does, I don't have IDA right now. –  Jul 20 '15 at 07:35
  • I've reworded the question to take into account your remarks and @Michael's answer, in a way that I hope will be useful for other people. – anol Jul 20 '15 at 07:43

2 Answers2

7

In short, I'm looking for an automated version of this Wikipedia table of x86 instruction listings, to help me check if a given code should be compatible with a given processor.

You could emit a temporary assembly file with the following directive:

[CPU level]

Where level is one of:

  • 8086 Assemble only 8086 instruction set
  • 186 Assemble instructions up to the 80186 instruction set
  • 286 Assemble instructions up to the 286 instruction set
  • 386 Assemble instructions up to the 386 instruction set
  • 486 486 instruction set
  • 586 Pentium instruction set
  • PENTIUM Same as 586
  • 686 P6 instruction set
  • PPRO Same as 686
  • P2 Same as 686
  • P3 Pentium III (Katmai) instruction sets
  • KATMAI Same as P3
  • P4 Pentium 4 (Willamette) instruction set
  • WILLAMETTE Same as P4
  • PRESCOTT Prescott instruction set
  • X64 x86-64 (x64/AMD64/Intel 64) instruction set
  • IA64 IA64 CPU (in x86 mode) instruction set

followed by your code. Then invoke NASM to assemble that file, and observe the exit status and error message from NASM. There are similar directives for TASM/MASM in case you're not using NASM.


An example:

test8086.asm

[cpu 8086]
cmovne eax,ebx  ; Not a part of the 8086 instruction set


C:\nasm>nasm -f bin -o test8086.com test8086.asm
test8086.asm:2: error: no instruction for this cpu level

C:\nasm>echo %errorlevel%
1
Michael
  • 57,169
  • 9
  • 80
  • 125
  • Excellent. I've been able to do it on GNU `as` by adding an `rdtsc` instruction and then trying to assembly it with `as --32 -march=i386 file.s`, and it correctly rejected it with `Error: 'rdtsc' is not supported on 'i386'`. – anol Jul 20 '15 at 07:38
  • I think yasm can do this too. x264 uses this feature to make sure that a function declared as SSSE3 doesn't actually use any instructions from SSE4. – Peter Cordes Jul 20 '15 at 13:42
0

I ran into this problem when booting 32bit Ubuntu GNU/Linux on an old Athlon XP. A few programs died with SIGILL (illegal instruction).

I assume Ubuntu compiles even 32bit code with -mfpmath=sse, and the programs that crashed were using double-precision floating point (i.e. SSE2). Athlon XP doesn't support SSE2. AMD64 k8 CPUs were the first AMD CPUs to support it.

Look for movsd / addsd / comisd in the disassembly. (s = scalar, d = double. There could also be movapd / movupd / addpd / etc. (p = packed). grep (or search in less) for [sp]d .*%xmm, and that should probably find any SSE2 instructions. packed 32bit int instructions also tend to end with d (e.g. pshufd), but those are SSE2 or higher as well.

As @nrz correctly points out, not every instruction in a program will run. Also, some parts of the .text segment may actually be data, not code. Still, look for the CPUID instruction in disassembly output to see if the program checks what kind of CPU it's running on.

I do like @Michael's idea of disassembling to a temp file, adding a CPU limitation, and then checking for errors when you assemble.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847