8

I need to convert x86 assembly source code to LLVM human-readable .ll file (aka LLVM assembly language). How can I do this? If there is no direct solution would it be possible to implement one within the LLVM infrastructure with as less efforts as possible?

I guess, the solution I'm looking for should be some kind of llc's counterpart that converts .s file back to .ll representation.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
bsa2000
  • 382
  • 2
  • 11
  • 3
    This question was already asked & answered. There is no direct solution due to many stuff (e.g. indirect branches). You might find the project like llvm-qemu and libcpu useful for you. In any case, this question is a dup of http://stackoverflow.com/questions/6981810/translation-of-machinecode-into-llvm-ir-disassembly-reassembly-of-x86-64-x86 – Anton Korobeynikov Jan 26 '12 at 09:03
  • 1
    Thank you. I've already took a look on that projects you mentioned. Unfortunately, _llvm-qemu_ looks dead. And _libcpu_ looks like going it's own way in parsing assembly rather than using LLVM's infrastructure (so it appears to be incomplete in supporting x86 ISA). Actually, I thought that the tool I'm looking for should do the work of LLVM's **AsmPrinter** but in reverse direction translating native ISA instructions into LLVM's _MachineInstr_ or LLVM-MC's _MCInst_. – bsa2000 Jan 26 '12 at 09:22
  • And what about the LLVM's subproject **llvm-mc**? It has _AsmParser_ class that is able to eat .s file and generate its representation based on _MCInst_ class. In this case the only part remained undone is to go back in reverse direction with respect to _MCLowering_ class towards to LLVM's _MachineInstr_-based representation. – bsa2000 Jan 26 '12 at 09:42
  • MachineInstr != LLVM IR. MI is still a machine code. Consider e.g. you have "jmp [eax]" instruction. Which LLVM IR instruction(s) will you convert it into? – Anton Korobeynikov Jan 26 '12 at 10:52
  • 1
    For example, I would be interested with x86/x86_64 -> LLVM converter with restriction, that is capable of disassembling limited set of x86,x86_64 instructions, but reasonable to reassemble hello world and some computation algorithms. – Grzegorz Wierzowiecki Jan 31 '12 at 10:00

1 Answers1

8

Just for those who are still seeking for more information on this topic, I want to share the information about one ongoing project (http://dslab.epfl.ch/proj/s2e) that I've found on the web. The project has two components:

  1. x86-to-LLVM backend for dynamic translation of x86 machine code to LLVM IR
  2. RevGen tool for static analysis of x86 binaries, capable of translating inline x86 assembly to LLVM IR

Here is RevGen prototype: RevGen takes as input an x86 binary and outputs an equivalent LLVM module in three steps. First, RevGen looks for all executable blocks of code and converts them to LLVM translation blocks. Second, when there are no more translation blocks to cover, RevGen transforms them into basic blocks and rebuilds the control flow graph of the original binary in LLVM format. Third, RevGen resolves external function calls to build the final LLVM module. For dynamic analysis, a last step links the LLVM module with a run-time library that allows the execution of the LLVM module.

osgx
  • 90,338
  • 53
  • 357
  • 513
bsa2000
  • 382
  • 2
  • 11