57

Many dynamic languages implement (or want to implement) a JIT Compiler in order to speed up their execution times. Inevitably, someone from the peanut gallery asks why they don't use LLVM. The answer is often, "LLVM is unsuitable for building a JIT." (For Example, Armin Rigo's comment here.)

Why is LLVM Unsuitable for building a JIT?

Note: I know LLVM has its own JIT. If LLVM used to be unsuitable, but now is suitable, please say what changed. I'm not talking about running LLVM Bytecode on the LLVM JIT, I'm talking about using the LLVM libraries to implement a JIT for a dynamic language.

Sean McMillan
  • 10,058
  • 6
  • 55
  • 65
  • 6
    Hmm... http://stackoverflow.com/questions/4077396/llvm-jit-speed-up-choices/4097930#4097930 says the answer is "because it's too slow." – Sean McMillan Jul 26 '11 at 17:34
  • 11
    -1 LLVM is not considered unsuitable for implementing a JIT. – J D Mar 04 '12 at 13:05
  • 2
    Well Jon, I have several good answers below. Maybe you can write one about your I-implemented-a-JIT-with-LLVM-and-it-was-awesome experience? – Sean McMillan Mar 05 '12 at 17:47
  • 2
    For my next trick, I'll tell the Iron Chef how to make waffles. (smacks self.) – Sean McMillan Mar 05 '12 at 21:06
  • 1
    S'ok. Was worse on this other question where I got 4 downvotes despite being the only answerer to have written several low-latency commercial applications in functional languages! http://stackoverflow.com/a/4479114/13924 – J D Mar 06 '12 at 22:35

6 Answers6

48

Why is LLVM Unsuitable for building a JIT?

I wrote HLVM, a high-level virtual machine with a rich static type system including value types, tail call elimination, generic printing, C FFI and POSIX threads with support for both static and JIT compilation. In particular, HLVM offers incredible performance for a high-level VM. I even implemented an ML-like interactive front-end with variant types and pattern matching using the JIT compiler, as seen in this computer algebra demonstration. All of my HLVM-related work combined totals just a few weeks work (and I am not a computer scientist, just a dabbler).

I think the results speak for themselves and demonstrate unequivocally that LLVM is perfectly suitable for JIT compilation.

J D
  • 48,105
  • 13
  • 171
  • 274
  • 6
    Interesting. Your work is on statically typed functional languages, where the complaints I heard (linked in the post) were from people implementing dynamically typed imperative/OO languages. I wonder if the typing or functionalness has a bigger impact. – Sean McMillan Mar 05 '12 at 21:49
  • 4
    Both static typing and functional style have a big impact but you can address the mismatch. LLVM's own `mem2reg` optimization pass actually transforms imperative code over value types (ints, floats etc.) from memory operations into purely functional single-static-assignment code (the kind HLVM generates naturally). Dynamic typing is harder because it makes it impossible to attain predictably-good performance but some simple solutions should be effective such as representing all values as unboxed unions or compiling functions for all possible combinations of types of arguments on-demand. – J D Mar 05 '12 at 22:01
  • is this thing still alive? for which languages is it being used? – brauliobo Jun 23 '15 at 02:13
  • 1
    I haven't actively developed HLVM for many years and AFAIK it has no users. – J D Jun 24 '15 at 15:19
  • 4
    Julia actually implements your latter suggestion, with excellent performance. – Demi Jul 03 '15 at 04:29
27

There are some notes about LLVM in the Unladen Swallow post-mortem blog post: http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html .

Unfortunately, LLVM in its current state is really designed as a static compiler optimizer and back end. LLVM code generation and optimization is good but expensive. The optimizations are all designed to work on IR generated by static C-like languages. Most of the important optimizations for optimizing Python require high-level knowledge of how the program executed on previous iterations, and LLVM didn't help us do that.

Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65
  • 27
    Many people seem to come to LLVM with the dream that you flick a switch and it will magically optimize your poorly generated code instantaneously but that is not the case. Garbage in, garbage out. If you want LLVM to generate fast code then you must generate optimized IR yourself. Dynamically-typed languages like Python will be particularly hard-hit because any upcasting/boxing destroys the static type information that LLVM's optimization phases rely upon. – J D Mar 05 '12 at 19:34
12

It takes a long time to start up is the biggest complaint - however, this is not so much of an issue if you did what Java does and start up in interpreter mode, and use LLVM to compile the most used parts of the program.

Also while there are arguments like this scattered all over the internet, Mono has been using LLVM as a JIT compiler successfully for a while now (though it's worth noting that it defaults to their own faster but less efficient backend, and they also modified parts of LLVM).

For dynamic languages, LLVM might not be the right tool, just because it was designed for optimizing system programming languages like C and C++ which are strongly/statically typed and support very low level features. In general the optimizations performed on C don't really make dynamic languages fast, because you're just creating an efficient way of running a slow system. Modern dynamic language JITs do things like inlining functions that are only known at runtime, or optimizing based on what type a variable has most of the time, which LLVM is not designed for.

parkovski
  • 1,503
  • 10
  • 13
  • 8
    C/C++ are _not strongly typed_. Strongly typed means each value is permanently associated with a type that cannot be changed, which is true of most dynamic languages, but is not true of C and C++ with their reinterpret cast. – Jan Hudec Dec 18 '13 at 07:44
  • @JanHudec you can do the same exact thing in Haskell, does that make it weakly typed? – alternative Nov 13 '14 at 01:47
  • 1
    @alternative: Are you saying that you can give Haskell a block of raw bytes and tell it to start treating it as, for example, `IO()`? In the core language? (The unsafe part intended for binding C functions must be able to do that kind of thing, but that is an interoperability extension and I wouldn't count that as Haskell). – Jan Hudec Nov 13 '14 at 09:54
  • @JanHudec https://hackage.haskell.org/package/base-4.6.0.1/docs/Unsafe-Coerce.html. So yes. I'm not arguing that you are wrong according to common convention, but that the logic that drives the common convention is simply wrong. – alternative Nov 13 '14 at 17:51
  • 3
    I'm looking at this area in a bit more depth, and LLVM appears to work fine for JITting dynamic languages _providing you can establish what the concrete types are_ and are able to throw stuff away and backtrack to a different strategy when you're wrong. – Donal Fellows Nov 24 '14 at 10:20
  • @alternative in C: `uint32_t = -1; char y = t;` will compile in C, where as in Python or Haskell `x = 0, y = '1', x + y` will cause an error. The logic makes sense. – Justin Raymond Aug 11 '16 at 03:09
  • @JustinRaymond `y = unsafeCoerce t`. Your logic doesn't hold. – alternative Aug 11 '16 at 17:24
  • @alternative the idea is that in _weakly_-typed languages values may be cast between types *implicitly*, in a _strongly_ types language this does not happen. In the example I gave, implicit casts in C compile, but cause errors in Haskell and Python. – Justin Raymond Aug 11 '16 at 17:38
  • @JustinRaymond Where do you see a cast in the haskell code I provided? Incidentally, in python, you could make the same argument with the fact that `0.0 + 0` would work in python and C (and haskell w/ unsafeCoerce) so the boundaries are less clear than you are assuming. You claim that Python is strongly typed, but it has the same problem. The problem is not implicit casting, it is implicit coercion, which does not exist in C and as such I don't see your argument. Such a classification is useless and depends on your own personal definitions of "strong". – alternative Aug 11 '16 at 22:15
12

There is a presentation on using LLVM as a JIT backened where the address many of the concerns raised as to why its bad, most of its seems to boil down to people building a static compiler as a JIT instead of building an actual JIT.

Necrolis
  • 25,836
  • 3
  • 63
  • 101
11

Update: as of 7/2014, LLVM has added a feature called "Patch Points", which are used to support Polymorphic Inline Caches in Safari's FTL JavaScript JIT. This covers exactly the use case complained about int Armin Rigo's comment in the original question.

Sean McMillan
  • 10,058
  • 6
  • 55
  • 65
4

For a more detailed rant about the LLVM IR see here: LLVM IR is a compiler IR.

Sean McMillan
  • 10,058
  • 6
  • 55
  • 65