7

In the llvm tutorials and examples, the compiler outputs LLVM IR by making calls like this

return Builder.CreateAdd(L, R, "addtmp");

but many interpreters are written like this:

switch (opcode) {
     case ADD:
             result = L + R;
             break;
     ...

How would you extract each of these code snippets to make a JIT with LLVM without having to re-implement each opcode in LLVM IR?

joeforker
  • 40,459
  • 37
  • 151
  • 246

1 Answers1

9

Okay, first take all of your code snippets and refactor them into their own functions. So your code goes to:

void addOpcode(uint32_t *result, uint32_t L, uint32_t R) {
    *result = L + R;
}

switch (opcode) {
    case ADD:
            addOpcode(&result, L, R);
            break;
     ....

Okay, so after doing this your interpreter should still run. Now take all the new functions and place them in their own file. Now compile that file using either llvm-gcc or clang, and instead of generating native code compile it using the "cpp" backend (-march -cpp). That will generate C++ code that instantiates the byte code for the compilation unit. You can specify options to limit it to specific functions, etc. You probably want to use "-cppgen module" .

Now back your interpreter loop glue together calls to the generated C++ code instead of directly executing the original code, then pass it to some optimizers and a native codegenerator. Gratz on the JIT ;-) You can see an example of this in a couple of LLVM projects, like the vm_ops in llvm-lua.

Louis Gerbarg
  • 43,356
  • 8
  • 80
  • 90
  • Wonderful! I thought it would be something like that, with LLVM inlining all the functions. – joeforker Feb 03 '09 at 14:53
  • How does this compare to a call-threaded interpreter where you only JIT a series of CALL instructions to each bytecode implementation, inline the implementation of only a few opcodes most likely BRANCH opcodes, and each opcode implementation ends with RET? – joeforker Feb 03 '09 at 14:55
  • I don't quite follow this. Are you saying to pass all the opcode functions into LLVM, and when you output it back to C, it will automatically have a JIT built in? – Unknown Apr 08 '09 at 07:08
  • You are not outputing it as C, you are outputing C++ code that instantiates the in memory byte code compiled representation of the function. – Louis Gerbarg Apr 13 '09 at 18:49