54

I have heard that Java must use a JIT to be fast. This makes perfect sense when comparing to interpretation, but why can't someone make an ahead-of-time compiler that generates fast Java code? I know about gcj, but I don't think its output is typically faster than Hotspot for example.

Are there things about the language that make this difficult? I think it comes down to just these things:

  • Reflection
  • Classloading

What am I missing? If I avoid these features, would it be possible to compile Java code once to native machine code and be done?

Adam Goode
  • 7,380
  • 3
  • 29
  • 33

10 Answers10

45

A JIT compiler can be faster because the machine code is being generated on the exact machine that it will also execute on. This means that the JIT has the best possible information available to it to emit optimized code.

If you pre-compile bytecode into machine code, the compiler cannot optimize for the target machine(s), only the build machine.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • Nice one there. Clear and concise. – o.k.w Dec 10 '09 at 04:53
  • Why do we not use a JIT for C/C++ then? I guess that's where LLVM comes in. – Adam Goode Dec 10 '09 at 04:57
  • @Ibrahim - we are talking here about variations in instruction sets; e.g. which of the many extensions to the x86 instruction set are available on the execution platform. – Stephen C Dec 10 '09 at 05:21
  • 16
    Ahead of time compilers can still match that. The Intel C++ compiler, for example, can be told to emit multiple versions of a piece of code, each one tuned for a slightly different processor target. It'll add code to autodetect the processor at startup and select the most appropriate code path. – Boojum Dec 10 '09 at 06:13
  • @Adam. The traditional approach to C programs has been that the object file is native code with no further processing. This makes sense if CPU-time is a premium, which it used to be in the old days. Today memory access is the premium on PC's, so it doesn't matter if you spend a lot of effort post processing. – Thorbjørn Ravn Andersen Dec 10 '09 at 07:22
  • Almost correct, you can optimize for _a_ target machine, most compilers will allow you to specify the target architecture. However you can only optimize for _one_ target machine, a JIT will always optimize for the target machine. – Paul Wagland Dec 10 '09 at 07:41
  • How much speedup will this precise targeting gain? When Fedora benchmarked tuning these settings, they only saw 1-2% gains: https://www.redhat.com/archives/fedora-devel-list/2009-June/msg01506.html – Adam Goode Dec 10 '09 at 18:05
  • 1
    For most programs, the machine specific optimization is NOT significant. I don't think this is the point of JIT. – xuhdev Nov 03 '14 at 19:09
  • @Boojum rumor has it that the code path chosen for non-Intel CPUs was not optimal. It may have been improved since. – Thorbjørn Ravn Andersen Dec 24 '14 at 20:02
  • .NET and LLVM have the concept of intermediate code where the code can be generated on the target environment. This can be done at install time where the installer executes the AOT, making the runtime environment available. The JIT introduces its own performance penalty, especially if its running in a separate process. A context switch can easily mitigate any benefits of optimization. – ATL_DEV Jan 11 '20 at 19:14
34

I will paste an interesting answer given by the James Gosling in the Book Masterminds of Programming.

Well, I’ve heard it said that effectively you have two compilers in the Java world. You have the compiler to Java bytecode, and then you have your JIT, which basically recompiles everything specifically again. All of your scary optimizations are in the JIT.

James: Exactly. These days we’re beating the really good C and C++ compilers pretty much always. When you go to the dynamic compiler, you get two advantages when the compiler’s running right at the last moment. One is you know exactly what chipset you’re running on. So many times when people are compiling a piece of C code, they have to compile it to run on kind of the generic x86 architecture. Almost none of the binaries you get are particularly well tuned for any of them. You download the latest copy of Mozilla,and it’ll run on pretty much any Intel architecture CPU. There’s pretty much one Linux binary. It’s pretty generic, and it’s compiled with GCC, which is not a very good C compiler.

When HotSpot runs, it knows exactly what chipset you’re running on. It knows exactly how the cache works. It knows exactly how the memory hierarchy works. It knows exactly how all the pipeline interlocks work in the CPU. It knows what instruction set extensions this chip has got. It optimizes for precisely what machine you’re on. Then the other half of it is that it actually sees the application as it’s running. It’s able to have statistics that know which things are important. It’s able to inline things that a C compiler could never do. The kind of stuff that gets inlined in the Java world is pretty amazing. Then you tack onto that the way the storage management works with the modern garbage collectors. With a modern garbage collector, storage allocation is extremely fast.

Community
  • 1
  • 1
Edwin Dalorzo
  • 76,803
  • 25
  • 144
  • 205
  • 3
    this is interesting. How about I compile my program exactly on the machine I run, so the performance would be good and I don't need the whole VM running. And this tends to happen a lot since we run in an environment that most of machines are identical. – Dzung Nguyen Dec 21 '14 at 06:15
  • @DzungNguyen, Even better, how about using a compiler like LLVM which generates intermediate code which is compiled into machine code for a specific runtime environment upon installation. Remember one very critical fact, JITs are written in AOTs. Anything an JIT can do, so can an AOT. – ATL_DEV Jan 11 '20 at 17:57
  • 1
    @DzungNguyen You would just choose another JVM optimized for this scenario instead. – Thorbjørn Ravn Andersen Feb 14 '21 at 21:14
24

The real killer for any AOT compiler is:

Class.forName(...)

This means that you cannot write a AOT compiler which covers ALL Java programs as there is information available only at runtime about the characteristics of the program. You can, however, do it on a subset of Java which is what I believe that gcj does.

Another typical example is the ability of a JIT to inline methods like getX() directly in the calling methods if it is found that it is safe to do so, and undoing it if appropriate, even if not explicitly helped by the programmer by telling that a method is final. The JIT can see that in the running program a given method is not overriden and is therefore in this instance can be treated as final. This might be different in the next invocation.


Edit 2019: Oracle has introduced GraalVM which allows AOT compilation on a subset of Java (a quite large one, but still a subset) with the primary requirement that all code is available at compile time. This allows for millisecond startup time of web containers.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
  • 1
    All class loading happen at runtime. I do not understand your comment. – Thorbjørn Ravn Andersen Dec 10 '09 at 18:25
  • 1
    I mean, classloading that couldn't be predicted at bytecode-generation time. Class.forName takes a string and produces a class. There is no way to know what class it might be. If you didn't do this, a AOT compiler could know all the classes you might use and do some optimizations then, right? – Adam Goode Dec 10 '09 at 20:40
  • 1
    Yes, this is exactly why it breaks. – Thorbjørn Ravn Andersen Dec 10 '09 at 23:13
  • 7
    Class.forName() for __known__ classloaders can be handled by resolving to precompiled classes just fine. We did that for Eclipse RCP and Tomcat classloaders, not to mention system and application: http://www.excelsiorjet.com – Dmitry Leskov Feb 16 '11 at 07:36
  • @Dmity, very interesting. I did not know that this was feasible. – Thorbjørn Ravn Andersen Feb 16 '11 at 09:51
  • There's also no reason why an AOT compiler can't be available at runtime. Then loading arbitrary code is just a matter of compiling it and linking it into the running executable. The line between AOT and JIT is just as fuzzy as the line between compilers and interpreters. – John Cowan Jan 27 '21 at 12:41
  • @john cowan I would not consider that AOT – Thorbjørn Ravn Andersen Jan 27 '21 at 15:34
  • @ThorbjørnRavnAndersen: It is AOT in the sense that it does not use runtime feedback to optimize running code. An example is the Common Lisp compiler SBCL; if you load a source file at runtime, it will be compiled and linked into the existing program (SBCL does not provide standalone executables in the usual sense, though it can dump an image.) – John Cowan Feb 13 '21 at 22:07
  • @JohnCowan A JIT does not need to use runtime feedback either. To my understanding the initial JVM JIT didn't. – Thorbjørn Ravn Andersen Feb 14 '21 at 21:12
23

Java's JIT compiler is also lazy and adaptive.

Lazy

Being lazy it only compiles methods when it gets to them instead of compiling the whole program (very useful if you don't use part of a program). Class loading actually helps make the JIT faster by allowing it to ignore classes it hasn't come across yet.

Adaptive

Being adaptive it emits a quick and dirty version of the machine code first and then only goes back and does a through job if that method is used frequently.

Luke Quinane
  • 16,447
  • 13
  • 69
  • 88
  • 12
    Another aspect of its adaptiveness is that it can gather stats on the likely outcome of tests / branches while interpreting bytecodes, and feed this into the JIT compiler to produce better code. – Stephen C Dec 10 '09 at 05:19
12

In the end it boils down to the fact that having more information enables better optimizations. In this case, the JIT has more information about the actual machine the code is running on (as Andrew mentioned) and it also has a lot of runtime information that is not available during compilation.

Tal Pressman
  • 7,199
  • 2
  • 30
  • 33
  • 2
    LLVM has the same information too and oddly so does Linux. You can get the same or similar benefits by compiling your code on the target machine. – ATL_DEV Jan 11 '20 at 19:03
8

In theory, a JIT compiler has an advantage over AOT if it has enough time and computational resources available. For instance, if you have an enterprise app running for days and months on a multiprocessor server with plenty of RAM, the JIT compiler can produce better code than any AOT compiler.

Now, if you have a desktop app, things like fast startup and initial response time (where AOT shines) become more important, plus the computer may not have sufficient resources for the most advanced optimizations.

And if you have an embedded system with scarce resources, JIT has no chance against AOT.

However, the above was all theory. In practice, creating such an advanced JIT compiler is way more complicated than a decent AOT one. How about some practical evidence?

Dmitry Leskov
  • 3,233
  • 1
  • 20
  • 17
  • Hmm, that is a interesting link, but I would be more interested to see a comparison with gcj instead of gcc. – Adam Goode Dec 10 '09 at 17:42
  • Stefan's previous benchmarking session (http://stefankrause.net/wp/?p=6) included gcj and Apache Harmony, but it is a bit more outdated. Also, comparisons with those implementations are not perfectly correct as they are not tested for compliance with the Java SE spec. There are some overheads in a fully compliant implementation, one of them related to stack overflow handling (pun intended :) ). – Dmitry Leskov Dec 11 '09 at 06:20
  • What about context switching? Everytime a the JIT has to go back and refine the optimization, it requires switching threads or worse, an entire process. LLVM does a lot of similar optimizations, since it generates intermediate code which is turned into machine code on the target environment. – ATL_DEV Jan 11 '20 at 19:01
7

Java's ability to inline across virtual method boundaries and perform efficient interface dispatch requires runtime analysis before compiling - in other words it requires a JIT. Since all methods are virtual and interfaces are used "everywhere", it makes a big difference.

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • @28 - in theory, you can figure out (conservatively) the complete set of classes that could be used in a given Java program, simply by examining the source code or bytecode. Therefore, you COULD do these optimizations statically. – Stephen C Dec 10 '09 at 05:28
  • You're still compiling most everything. e.g. method calls based upon IO. – Jé Queue Dec 10 '09 at 07:36
  • Some C++ compilers will use profiling information. Static analysis can be used to inline many virtual method calls. – Tom Hawtin - tackline Dec 10 '09 at 08:38
  • Whole Program Optimization does that in MSVC. – hoodaticus Sep 28 '16 at 15:24
  • LLVM does the same thing, so does .NET and Linux. If you compile the code on the target machine, you have access to the same data that the JIT has. What's the difference? C++ methods are not virtual by default. – ATL_DEV Jan 11 '20 at 19:07
  • According to https://devblogs.microsoft.com/java/aot-compilation-in-hotspot-introduction/ modern versions of OpenJDK can AOT compile. – Thorbjørn Ravn Andersen Feb 14 '21 at 21:17
6

JITs can identify and eliminate some conditions which can only be known at runtime. A prime example is the elimination of virtual calls modern VMs use - e.g., when the JVM finds an invokevirtual or invokeinterface instruction, if only one class overriding the invoked method has been loaded, the VM can actually make that virtual call static and is thus able to inline it. To a C program, on the other hand, a function pointer is always a function pointer, and a call to it can't be inlined (in the general case, anyway).

Here's a situation where the JVM is able to inline a virtual call:

interface I { 
    I INSTANCE = Boolean.getBoolean("someCondition")? new A() : new B();
    void doIt(); 
}
class A implements I { 
    void doIt(){ ... } 
}
class B implements I { 
    void doIt(){ ... } 
}
// later...
I.INSTANCE.doIt();

Assuming we don't go around creating A or B instances elsewhere and that someCondition is set to true, the JVM knows that the call to doIt() always means A.doIt, and can therefore avoid the method table lookup, and then inline the call. A similar construct in a non-JITted environment would not be inlinable.

gustafc
  • 28,465
  • 7
  • 73
  • 99
  • I don't know what you're talking about here. C doesn't support virtual functions, but if you mean C++, it does indeed support inline functions, functions and virtual functions, since it’s a superset of C. An inline function is code that replaces a function call with the actual function’s code. This eliminates the need to setup a stack frame which incurs a small performance penalty. There’s a slightly performance gain for virtual function since the entire vtable can be removed. Most decent C++ compilers will warn or suggest inlining a function or even convert for you automatically. – ATL_DEV Jan 11 '20 at 18:54
  • Most C++ programs don’t benefit significantly from the kind of function inlining the JIT is doing. The time it takes to setup and make a function call is trivial, especially for large functions. In comparison, a JIT optimization may actually incur a greater performance penalty, especially in the cases where the function is only called once. Don’t forget the hidden performance cost of switching processes to execute those optimizations, which is exponentially slower than a function or virtual function call. – ATL_DEV Jan 11 '20 at 18:54
  • @ATL_DEV This question is about ahead-of-time Java compilation, so what C, C++ or any other non-Java (or perhaps non-JVM) language does is off topic. My response is all about why AOT Java will miss out on some optimizations. – gustafc Jan 13 '20 at 09:31
  • OK. Well then my comment is the reciprocal: why c++ wouldn’t benefit much from AOT optimizations. – ATL_DEV Jan 14 '20 at 14:46
2

I think the fact that the official Java compiler is a JIT compiler is a large part of this. How much time has been spent optimizing the JVM vs. a machine code compiler for Java?

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
2

Dimitry Leskov is absolutely right here.

All of the above is just theory of what could make JIT faster, implementing every scenaro is almost impossible. Besides, due to the fact that we only have a handful of different instruction sets on x86_64 CPUs there is very little to gain by targeting every instruction set on the current CPU. I always go by the rule of targeting x86_64 and SSE4.2 when building performance critical applications in native code. Java's fundamental structure is causing a ton of limitations, JNI can help you show just how inefficient it is, JIT is only sugarcoating this by making it overall faster. Besides the fact that every function by default is virtual, it also uses class types at runtime as opposed to for example C++. C++ has a great advantage here when it comes to performance, because no class object is required to be loaded at runtime, it's all blocks of data that gets allocated in memory, and only initialized when requested. In other words C++ doesn't have class types at runtime. Java classes are actual objects, not just templates. I'm not going to go into GC because that's irrelevant. Java strings are also slower because they use dynamic string pooling which would require runtime to do string searches in the pool table each time. Many of those things are due to the fact that Java wasn't first built to be fast, so its fundament will always be slow. Most native languages (primarily C/C++) was specifically built to be lean and mean, no waste of memory or resources. The first few versions of Java in fact were terribly slow and wasteful to memory, with lots of unnecessary meta data for variables and what not. As it is today, JIT being capable of producing faster code than AOT languages will remain a theory.

Think about all the work the JIT needs to keep track of to do the lazy JIT, increment a counter each time a function is called, check how many times it's been called.. so on and so forth. Running the JIT is taking a lot of time. The tradeof in my eyes is not worth it. This is just on PC

Ever tried to run Java on Raspberry and other embedded devices? Absolutely terrible performance. JavaFX on Raspberry? Not even functional... Java and its JIT is very far from meeting all of what it advertises and the theory people blindly spew out about it.

  • C++ was definitively not built to be mean and lean originally, and it took quite a while to get there. – Thorbjørn Ravn Andersen Jul 25 '18 at 23:18
  • I remember some C++ compilers had an option to turn on runtime typing information or RTTI. It's really just metadata stored when the class is allocated. There's a small storage penalty and a small penalty whenever you request an object's type information. – ATL_DEV Jan 11 '20 at 17:51