All languages end up as assembly code in the process to be executed?

Question

Sorry if it's a dumb question or obvious, if so i will delete it. Im searching and i can't find a definetive answer to that. There it goes: High level languages like PHP, Ruby or Java and so on are all decoded to assembly to be executed by the CPU??

Not sure how PHP/ruby (IIRC ruby is interpreted) works, but java is converted into bytecode which is then processed by the JVM — Kevin L, Jul 23 '14 at 18:24
@KevinL The JVM can JIT-compile the bytecode into native assembly, depending on how frequently the section of code in question is being run. The JITting is more aggressive for `-server` VMs (compared to `-client`). — C. K. Young, Jul 23 '14 at 18:25
The examples you use all started out as interpreted languages. A good strategy to get a language easily ported to many operating systems. But that made them slow and gave people a reason to do something about it. Examples are Rubinius for Ruby, HipHop for PHP, Hotspot for Java. — Hans Passant, Jul 23 '14 at 18:48
Depends on your point of view. CPUs only ever execute machine code; so the sequence of your mouse clicks in your word processor causes a certain sequence of machine codes to be executed. The same is true for *interpreted* languages: The interpreter is made of machine code but the program it runs is ASCII and based on that the interpreter only decides which of its included sequences of machine code it should run in which order. So, strictly, no, interpreted languages are not *compiled* to machine code. — JimmyB, Jul 23 '14 at 18:48
Very similar: http://stackoverflow.com/questions/24918634/what-type-of-assembly-do-c-compilers-use/24918819?noredirect=1#comment38718057_24918819 — Seva Alekseyev, Jul 23 '14 at 19:14
It's a smart question, and people are gong to tell you that the world falls neatly into compilers and interpreters. Reality is much more complex. There are execution systems that produce only machine code and others that produce none at all and many, many variations between: Partial compilation systems, JIT compilation systems, bytecodes, treecodes, and ... — Gene, Jul 23 '14 at 23:39
Usually all are translated into [intermediate languages](http://en.wikipedia.org/wiki/Intermediate_language) i.e [three address code](http://en.wikipedia.org/wiki/Three_address_code) which IIRC i first read about in the [dragon book](http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools), it is said that from translating the intermediate language into the __target lang__, you are not restricted to assembly, you could translate to C++ if you really wanted to. — James, Jul 23 '14 at 23:59

didierc · Answer 1 · 2014-07-23T23:53:43.357

Strictly speaking assembly code is code written in assembly language, which is not machine code, but a symbolic representation of it. Anything which is executed on a computer must go through a processing unit (CPU, GPU, or some other kind of processing unit), and that must be encoded in machine code understood by that unit. Native programs are such programs written in that host machine code.

Now, there are two categories of programs handling source code:

interpreters: they simply read the source code and execute routines (written in machine code) built in their own code to simulate the instructions of the source language;
compilers: they apply a series of transformations on the source code (what you meant by the word “decoded“ in your question), in order to obtain something more easily executable.
- bytecompiler will transform the code in something called a bytecode, ie, a language ressembling machine code, but artificial, as it will be interpreted later on (so the individual instructions will be mapped to native routines)
- native compilers will produce machine code, ie, something understood by a machine (usually the same machine which runs the compiler).

For most scripting languages like PHP, Ruby, Python, the language is either interpreted, or byte compiled and that bytecode interpreted.

The compilation phase may be:

transparent: PHP does exactly that, it takes php source files, bytecompile and run them directly - some extensions to PHP allow caching of the bytecode to save time when the script must be executed again,
or it could be explicit: Java requires source files to be first compiled to the bytecode format understood by the JVM, and will not try to interpret them (it's not the case of all language supported by the JVM: scala for instance has an intepreter for interactive sessions).

Other languages provide different solutions: ocaml for instance has a bytecompiler and a native compiler to target the host machine code. There are also other compilers aiming at other platforms: javascript, the jvm.

Python supports also several platforms through different implementations: bytecode, native.

Lately, PHP also got a few more options with hhvm and hack.

Finally, note that the distinction between bytecode and real machine code can be really thin. For instance, there are CPUs able to execute directly the jvm bytecode (either as their main machine language or as an extension). There are also programs to simulate CPUs (see for instance qemu), and CPUs able to understand other architectures: the Transmeta line of CPUs were able to execute x86 code by translating it on the fly into their own machine code.

rlam12 · Answer 2 · 2014-07-24T13:31:50.613

2

In the end yes. Interpreted languages are no different, the code they are made of corresponds to machine code found in the interpreter.

The steps to do this, however, can be complex. The script is read by the interpreter (hence the name) and transformed into some kind of bytecode (or compiled into bytecode that is then interpreted), which is assembler code for a virtual processor defined by the interpreter. Each op-code of this virtual machine makes the interpreter call some kind of function or functions in the interpreter's code (which is already machine code).

A very good example of how this is done and how to implement your very own bytecode interpreter is given here

edited Jul 24 '14 at 13:31

answered Jul 23 '14 at 19:17

rlam12

603
5
16

1

This is absolutely wrong. A pure interpreter runs on a tree or bytecode produced from the source. While the interpreter itself is of course a machine-level program, it never needs to translate its input to machine code in order to execute it. – Gene Jul 23 '14 at 23:37
but it needs machine-executable code which is associated with the interpreted tokens. The term "translation" may be ambiguous in this context. – Deleted User Jul 24 '14 at 00:38

All languages end up as assembly code in the process to be executed?

2 Answers2