Why Do Interpretors Compile the Code Everytime a Program is Run?
They don't. An interpreter never compiles. It interprets. If it compiled, it would be a compiler, not an interpreter.
Interpreters interpret, compilers compile.
My question is about all interpreted languages, but to illustrate my point better I will use Java as an example.
There is no such thing as an interpreted language. Whether an interpreter or a compiler is used is purely a trait of the implementation and has absolutely nothing whatsoever to do with the language.
Every language can be implemented by either an interpreter or a compiler. The vast majority of languages have at least one implementation of each type. (For example, there are interpreters for C and C++ and there are compilers for JavaScript, PHP, Perl, Python and Ruby.) Besides, the majority of modern language implementations actually combine both an interpreter and a compiler (or even multiple compilers).
A language is just a set of abstract mathematical rules. An interpreter is one of several concrete implementation strategies for a language. Those two live on completely different abstraction levels. If English were a typed language, the term "interpreted language" would be a type error. The statement "Python is an interpreted language" is not just false (because being false would imply that the statement even makes sense, even if it is wrong), it just plain doesn't make sense, because a language can never be defined as "interpreted."
What I know for Java is that when programmers write their code they have to compile it in java byte codes which are like machine language for a universal java virtual machine architecture. Then they can distribute their code to any machine which runs the Java Virtual Machine (JVM).
That is not true. There is nothing in the Java Language Specification that requires bytecode. There isn't even anything in there that requires Java to be compiled at all. It is perfectly legal and specification-compliant to interpret Java or to compile it to native machine code, and in fact, both have been done.
Also, I am curious: in this paragraph, you describe Java as a language that is always compiled, yet in the previous paragraph, you use Java as an example of an interpreted language. That makes no sense.
The JVM is then just a program that takes the java byte codes and compiles them (for the specific architecture) every time I run my program.
Again, there is nothing in the Java Virtual Machine Specification that says anything about compilation or interpretation at all, let alone when or how often code is compiled.
It is perfectly legal and specification-compliant to interpret JVML bytecode, it is likewise perfectly compliant to compile it once, and in fact, both have been done.
From my understanding (please correct me if I am wrong here) if I run my code the JVM will compile it on the fly, my machine will run the compiled instructions, and when I close the program all the compilation work will be lost, only to be done again, the second time I want to run my program.
This totally depends on which JVM you are using, which version of which JVM you are using, and sometimes even on the specific environment and/or commandline parameters.
Some JVMs interpret the bytecode (e.g. old versions of Sun's JVM). Some versions compile the bytecode once (e.g. Excelsior.JET). Some versions interpret the bytecode in the beginning, collect profiling information and statistics while the program is running, use this data to find the so-called "hot spots" (i.e. the code that is executed most often and thus the code that benefits the most from speeding it up) and then compiles those hot spots using the profiling data for optimization (e.g. IBM J9, Oracle HotSpot). Some use a similar trick but have a non-optimizing fast compiler instead of the interpreter. Some cache and re-use the compiled native machine code (e.g. the now-abandoned JRockit).
This is also the reason why generally interpreted languages are slow, because they have to compile every time on the fly.
It doesn't make sense to talk about a language being slow or fast. Languages aren't slow or fast. A language is just a piece of paper.
A particular piece of code running on a particular version of a particular execution engine of a particular language in a particular environment on a particular piece of hardware under a particular set of circumstances may or may not be slower than another particular piece of code running on another particular version of another particular execution engine of another particular language in another particular environment on another particular piece of hardware under another particular set of circumstances, but that has nothing to do with the language.
In general, performance is mostly a question of money, and to a lesser extent a question of the execution environment. Running a particular piece of code written in C++ compiled with Microsoft Visual C++ on Windows on a MacPro is indeed likely to be faster than running a similar piece of code written in Ruby executed by YARV on Windows on a MacPro.
However, the main reason for that is that Microsoft is a giant company that has poured huge amounts of money, research, engineering, manpower, and other resources into Visual C++, whereas YARV is mostly a volunteer effort. Also, most mainstream operating systems like Windows, macOS, Linux, the various BSDs and Unices, etc. and most mainstream CPU architectures such as AMD64, x86, PowerPC, ARM, SPARC, MIPS, Super-H, and so on, are optimized for speeding up programs in C-like languages and have much fewer optimizations for Smalltalk-like languages. In fact, some features even actively hurt them (e.g. Virtual Memory can increase garbage-collection latencies significantly, even though it is totally useless in a memory-managed language).
However, all of this makes no sense for me. Why not download the java byte codes on my machine, have the JVM compile them for my specific architecture once and create an executable file, and then the next time I want to run the program, I just run the compiled executable file.
If that is what you want, there is nobody stopping you. That is exactly what Excelsior.JET does, for example. Nobody is forcing you to use IBM J9 or Oracle HotSpot.
I know that while compiling the JVM does some clever dynamic optimizations; however isn't their purpose only to compensate for the slowness of the interpretation mechanism?
Those dynamic optimizations are only possible precisely because they are dynamic. There are several fundamental impossibility results in programming that severely restrict the kinds of optimizations a static ahead-of-time compiler can do. The Halting Problem, Rice's Theorem, the Function Problem, etc.
For example, inlining in a language like Java, requires Class Hierarchy Analysis. In other words, the compiler needs to prove that a method is not overridden in order to be able to inline it. As it turns out, Class Hierarchy Analysis in a language with dynamic loading is equivalent to solving the Halting Problem. Ergo, a static compiler can only inline in a limited amount of cases, it can not in the general case, tell whether a method is overridden or not.
A dynamic JIT compiler that compiles code at runtime doesn't need to prove that a method isn't overridden. It doesn't need to statically compute what the Class Hierarchy will be at runtime. The class hierarchy is right there: it can simply look: is the method overridden or not? Therefore, a dynamic compiler can inline in a lot more cases than a static compiler.
But there is more. A dynamic compiler can also perform de-optimization. Now, you might wonder: why would you want to de-optimize? Why make the code worse? Well, here's why: if you know you can de-optimize, then you can make optimizations based on guesses, and when it turns out you guessed wrong, then you can just remove the optimization again.
Keeping with our inlining example: unlike a static compiler, our dynamic compiler can determine with 100% accuracy whether a method is overridden or not. It can, however, not necessarily know whether the overridden method will ever be called or not. If the overridden method never gets called, then it is still safe and legal to inline the superclass method! So, what our clever dynamic compiler can do is inline the superclass method anyway but put a little typecheck at the beginning which ensures that if the receiver object is ever of the subclass type, we de-optimize back to the un-inlined version. This is called speculative inlining and is something a static AOT compiler fundamentally cannot do.
Polymorphic Inline Caching is an even more sophisticated optimization that modern high-performance language execution engines such as HotSpot, Rubinius, or V8 perform.
I mean, if the JVM has to compile once, run multiple time, then wouldn't this outweigh the speed-up of the optimizations done by the JVM?
Those dynamic optimizations are fundamentally impossible for a static optimizer.