LLVM Backend : Replacing indirect jmps for x86 backend

Question

I want to replace indirect jmp *(eax) instructions in the code to mov *(eax),ebx; jmp *ebx for the x86 executables.

Before implementing this, i would like to make LLVM compiler, log an output every time it detects a jmp *(eax) instruction by adding some print statements.

Then i want to move on to replacing the indirect sequence.

From what i have seen from google searches and articles, i can probably achieve this by modifying the x86asmprinter in the llvm backend. But i am not sure how to go about it. Any help or reading would be appreciated.

Note: My actual requirement deals with indirect jumps and pop, but i want to start with this to understand the backend a bit more before i dive into anything more.

score 7 · Accepted Answer · answered May 04 '15 at 18:41

I am done with my project. Posting my approach for the benefit of others.

The main function of LLVM backend is to convert the Intermediate Representation to the final executable depending on the target architecture and other specification. The LLVM backend itself consists of several phases which does target specific optimization,Instruction Selection, Scheduling and Instruction Emitting. These phases are required because the IR is a very generic representation and requires a lot of modifications to finally convert them to target specific executables.

1)Logging every time the compiler generates jmp *(eax)

We can achieve this by adding print statements to the Instruction Emitting/Printing phase. After most of the main conversion from IR is done, there is an AsmPrinter pass which goes through each Machine Instruction in a Basic Block of every function. This main loop is at lib/CodeGen/AsmPrinter/AsmPrinter.cpp:AsmPrinter::EmitFunctionBody(). There are other related functions like EmitFunctionEpilogue,EmitFunctionPrologue. These functions finally call EmitInstruction for specific architecture eg: lib/Target/X86/X86AsmPrinter.cpp. If you tinker around a bit, you can call MI.getOpcode() and compare it with defined enums for the architecture to print a log.

For example for a jump using register in X86, it is X86::JMP64r. You can get the register associated using MI.getOperand(0) etc.

if(MI->getOpcode() == X86::JMP64r)
dbgs() << "Found jmp *x instruction\n";

2)Replacing the instruction The required changes vary depending on the type of replacement you require. If you need more context about registers,or previous instructions, we would need to implement the changes higher up in the Pass chain. There is a representation of instructions called Selection DAG( directed acyclic graph ) which stores dependencies of each instruction to previous instructions. For example, in the sequence

mov myvalue,%rax
jmp *rax

The DAG would have the jmp instruction pointing to the move instruction ( and possibly other nodes before it) since the value of rax depends on the mov instruction. You can replace the Node here with your required Nodes. If done correctly, it should finally change the final instructions. The SelectionDAG code is at lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp. Always best to poke around first to figure out the ideal place to change. Each IR statement goes through multiple changes before the DAG is topologically sorted so that the Instructions are in a linear sequence. The graphs can be viewed using -view-dag* options seen in llc --help-hidden. In my case, I just added a specific check in EmitInstruction and added code to Emit two instructions that i wanted.

LLVM documentation is always there, but i found Eli Bendersky's two articles more helpful than any other resources. Life of LLVM Instruction and Deeper look into LLVM Code Generation. The articles discuss the very complex TableGen descriptions and the instruction matching process as well which is kind of cool if you are interested.

LLVM Backend : Replacing indirect jmps for x86 backend

1 Answers1

Linked