You're misunderstanding things a bit. Let's start by explaining the foundation of how computers work internally. I'll use simple and practical concepts here. For the underlying theories, read about Turing machines. So, what's your machine made up of? All computers have two basic components: a processor and a memory.
The memory is a sequential group of "cells" that works sort of like a table. If you "write" a value into the N
th cell, you can then retrieve that same value by "reading" from the N
th cell. This allows computers to "remember" things. If a computer is to perform a calculation, it needs to retrieve input data for it from somewhere, and to output data from it into somewhere. That place is the memory. In practice, the memory is what we call RAM, short for random access memory.
Then we have the processor. Its job is to perform the actual calculations on memory. The actual operations that are to be performed are mandated by a program, that is, a series of instructions that the processor is able to understand and execute. The processor decodes and executes an instruction, then the next one, and so on until the program halts (stops) the machine. If the program is add cell #1 and cell #2 and store result in cell #3
, the processor will grab the values at cells 1
and 2
, add their values together, and store the result into cell 3
.
Now, there's some sort of an intrinsic question. Where is the program stored, if at all? First of all, a program can't be hardcoded into the wires. Otherwise, the system is not more of a computer than your microwave. To these problems are two distinct approaches/solutions: the Harvard architecture and the Von Neumann Architecture.
Basically, in the Harvard architecture, the data (as always has been) is stored in the memory. The code (or program) is stored somewhere else, usually in read-only memory. In the Von Neumann architecture, code is stored in memory, and is just another form of data. As a result, code is data, and data is code. It's worth noting that most modern systems use the Von Neumann architecture for several reasons, including the fact that this is the only way to implement just-in-time compilation, an essential part of runtime systems for modern bytecode-based programming languages, such as Java.
We now know what the machine does, and how it does that. However, how are both data and code stored? What's the "underlying format", and how shall it be interpreted? You've probably heard of this thing called the binary numeral system. In our usual decimal numeral system, we have ten digits, zero through nine. However, why exactly ten digits? Couldn't they be eight, or sixteen, or sixty, or even two? Be aware that it's impossible to create an unary based computational system.
Have you heard that computers are "logical and cold". Both of them are true... unless your machine has an AMD processor or a special kind of Pentium. The theory states that every logical predicate can be reduced to either "true" or "false". That is to say that "treu" and "false" are the basis of logic. Plus, computers are made up of electrical cruft, no? A light switch is either on or off, no? So, at the electrical level we can easily recognize two voltage levels, right? And we want to handle logic stuff, such as numbers, in computers, right? So zero and one may be, as the only feasible solution they are.
Now, taking all the theory into account, let's talk about programming languages and assembly languages. Assembly languages are a way to express binary instructions in a (supposedly) readable way to human programmers. For instance, something like this...
ADD 0, 1 # Add cells 0 and 1 together and store the result in cell 0
Could be translated by an assembler into something like...
110101110000000000000001
Both are equivalent, but humans will only understand the former, and processors will only understand the later.
A compiler is a program that translates input data that is expected to conform to the rules of a given programming language into another, usually lower-level form. For instance, a C compiler may take this code...
x = some_function(y + z);
And translate it into assembly code such as (of course this is not real assembly, BTW!)...
# Assume x is at cell 1, y at cell 2, and z at cell 3.
# Assuem that, when calling a function, the first argument
# is at cell 16, and the result is stored in cell 0.
MOVE 16, 2
ADD 16, 3
CALL some_function
MOVE 1, 0
And the assembler will spit (this is not random)...
11101001000100000000001001101110000100000000001110111011101101111010101111101111110110100111010010000000100000000
Now, let's talk about another language, namely Java. Java's compiler does not give you assembly/raw binary code, but bytecode. Bytecode is... like a generic, higher-level form of assembly language that the CPU can't understand (there are exceptions), but another program that directly runs on the CPU does. This means that the lie that some badly educated people spread around, that "both interpreted and compiled programs ultimately boil down to machine code" is false. If, for example, the interpreter is written in C, and has this line of code...
Bytecode some_bytecode;
/* ... */
execute_bytecode(&some_bytecode);
(Note: I won't translate that into assembly/binary again!) The processor executes the interpreter, and the interpreter's code executes the bytecode, by performing the actions specified by the bytecode. Although, if not optimized correctly, this can severely degrade performance, this is not the problem per se, but the fact that things such as reflection, garbage collection, and exceptions can add quite some overhead. For embedded systems, whose memories are small and whose processors are slow, this is something you want. You're wasting precious system resources on things you don't need. If C programs are slow on your Arduino, image a full blown Java/Python program with all sorts of bells and whistles! Even if you translated bytecode into machine code before inserting it into the system, support must be there for all that extra stuff, and results in basically the same unwanted overhead/waste. You would still need support for reflection, exceptions, garbage collection, etc... It's basically the same thing.
On most other environments, this is not a big deal, as memory is cheap and abundant, and processors are fast and powerful. Embedded systems have special needs, they're special by themselves, and things are not free in that land.