Compiling high-level language to machine code

Question

After reading some answers from the site and viewing some sources, I thought that the compiler converts high-level language (C++ as an example) to machine code directly as the computer itself doesn't need to convert it to assembly, it only converts it to assembly for the user to view the code and can have more control over the code if needed.

But this was found in one of my lecture sheets, so can I would appreciate if someone could explain further and correct me if I am wrong, or the screenshot below.

Slide

Out of interest... Who gave you those slides? They are so wrong it's almost funny — Kris, Jul 25 '14 at 21:37
The lecturer might have had managed languages (like Java or C#) in mind, which compile to a machine-independent bytecode that is then translated to native machine code at runtime. This is of course still no excuse for such an utterly misleading slide. — ComicSansMS, Jul 25 '14 at 21:40
@ComicSansMS: But then they shouldn't call out C++ as the example in the slide. — Bill Lynch, Jul 25 '14 at 21:40
@sharth Of course. The slides now are just complete nonsense. I was just trying to reconstruct what might have driven the lecturer to make such absurd claims. — ComicSansMS, Jul 25 '14 at 21:41
Assembly is arbitrary for computers, it's nothing more than a bunch of `ctrl-h MOV 10110110` for the computers, it's the same thing to them. Assebly as a tool and concept is just for humans. — Charles Clayton, Jul 25 '14 at 21:42
If you got the slide from your lecturer, drop out immediately. — n. m. could be an AI, Jul 25 '14 at 21:45
@n.m. Not just immediately. Punch the lecturer before leaving :P ... @_OP Well seriously, your lecture might refer to [tag:c++-cli] rather than plain [tag:c++]. — πάντα ῥεῖ, Jul 25 '14 at 22:01
Unfortunately I got this from my "Doctor" that I now really doubt that he is a doctor, at the university... @πάνταῥεῖ ῥεῖ I am working on a bomb, no need to worry about wrong slides anymore! — Karim K., Jul 25 '14 at 22:08
@KarimK. Be cautious throwing a bomb or trying to slap your teacher! You've presented this slide without any additional context. It may make (some) sense within this context. But I'd still say, it's not a very good explanation, for whatever the doc wanted to teach you. — πάντα ῥεῖ, Jul 25 '14 at 22:16
Try applying this slide to interpretive languages like BASIC or LISP. It doesn't work because the OS doesn't touch your program, the interpreter does. — Thomas Matthews, Jul 25 '14 at 23:33
@πάνταῥεῖ ῥεῖ http://i.imgur.com/vfpXVFn.png ... Do you still think so? I obtained this from another slide by the way. — Karim K., Jul 26 '14 at 00:48

score 23 · Accepted Answer · answered Jul 25 '14 at 21:36

23

Your slide is mostly wrong...

There is a 1-to-1 mapping between assembly and machine code. Assembly is a textual representation of the information, and machine code is a binary representation.

Some machines however, support additional assembly instructions, but what instructions are included in the produced assembly code is still determined at compile time, not runtime. Generally speaking however, this is determined by the processor in the system (intel, amd, ti, nvidia, etc..) not the manufacturer that you purchase the whole system from.

answered Jul 25 '14 at 21:36

Bill Lynch

80,138
16
128
173

7

"Every machine like Toshiba, Dell or HP has its own machine code blabla" (on the left) is also complete bs. – quantdev Jul 25 '14 at 21:45
Alright, that cleared so many doubts! But I've another question that I didn't find an answer for, this part "Every Machine like, Toshiba and Dell or HP, has it own machine codes" lets say I write a C++ code, at which step is it converted to the machine-code related to (Intel/AMD processor)? Don't each of processor family or brand has it's own machine-code? – Karim K. Jul 25 '14 at 21:45
@KarimK. Like sharth explained, each **processor architecture** has it's own set of assembly instructions. An Apple Macintosh and HP could be using the same architecture with the same instructions such that you would compile the same C++ code on each and end up with the same Assembly when you compile. – scohe001 Jul 25 '14 at 21:47
@Josh, do you mean "output file" when you said "Assembly" there at the end of your comment? That usage might add a slight bit of confusion. – Steve Jul 25 '14 at 21:50
@KarimK. In addition, once code is compiled into assembly, it is then assembled by an assembler, which creates a binary file your processor can read and execute. – IllusiveBrian Jul 25 '14 at 21:50
@Namfuak: Your operating system reads the binary file. Your processor doesn't know what to do with ELF, COFF, etc. – Bill Lynch Jul 25 '14 at 21:56
@Namfuak: The assembly part isn't even necessary -- it just makes it so that the compiler can leverage an existing assembler (read: be lazy), and makes it easier for humans to see what's being generated. But it's entirely possible to skip that step, and a number of compilers do skip it unless you ask them to translate to assembly. – cHao Jul 25 '14 at 21:56
@Josh So a C++ compiler for lets say x64 processor is different from a C++ compiler for a x32 processor? – Karim K. Jul 25 '14 at 22:04
@KarimK. _'So a C++ compiler for ...'_ Yes the machine code generating backends are different and need to be specified. The frontend handling the parsing of the language used may still be the same. – πάντα ῥεῖ Jul 25 '14 at 22:10
@cHao: Assembly language listings are very useful, especially in the Embedded Systems Domain. They can be used when optimizing, to see how the compiler generated code. They are also useful for debugging when the compiler generated code that doesn't match the source listing. – Thomas Matthews Jul 25 '14 at 23:27

tohava · Answer 2 · 2014-07-25T22:27:39.873

This slide is confusing bytecode with textual assembly. Assembly is a human readable version of either bytecode or machine code. Machine code is what the hardware can run directly. Bytecode is further compiled to machine code, it is low level, but generic.

Some languages use byte code which is translsted during runtime into even lower level machine code. One example of this is java, where class files will sometimes be compiled to machine code asa runtime optimization. Another is cuda, where each nvidia gpu has a different instruction set but the cuda compiler generates bytecode that the cuda driver for each gpu can then translate.

Another option is that he is talking about how intel processors translate machine code during runtime into internal microcode and then run it, this is completely invisible to software though, including the OS.

+1 That very well explains, what I'm suspecting about this particular slide just now. — πάντα ῥεῖ, Jul 25 '14 at 22:19

score 4 · Answer 3 · answered Jul 25 '14 at 23:04

The slide is badly wrong in many ways.

A greatly simplified version of what actually happens in the example given in the slide — compiling C++ — would explain that there are four phases of compilation to produce and executable from a source code file:

Preprocessing
Compilation “proper”
Assembly
Linking

In the preprocessing phase, preprocessor directives, such as #include and #define are fully expanded and comments are stripped by the preprocessor, creating “postprocessed” C++. The slide omits this entirely.

In the compilation “proper” phase, the postprocessed text from the previous phase is converted into assembly language by the compiler. It's unfortunate that we use the same term — compilation — for both the whole four-step procedure and this one step, but that's the way it is.

Contrary to the slide, assembly language statements are not “readable by the OS” nor are they converted to machine code at run-time. Rather, they are readable by the assembler, which does its job (next paragraph) at compile-time.

In the assembly phase, the assembly language statements from the previous phase are converted into object code (binary machine code instructions that the CPU understands, combined with metadata that the OS and the linker understand) by the assembler.

In the linking phase, the object code from the previous phase is linked with other object code files and common/system libraries to form an executable.

At runtime, the OS — in particular the loader — reads the executable into memory and performs run-time linking, where references to common/system libraries are resolved and those libraries are loaded into memory (if they're not already) so that your executable is able to use them.

A further error is that different brands of machine do not have their “own machine codes”. What determines what machine codes are understood by a machine is the CPU. If two machines have the same CPU (e.g. a Dell laptop and a Toshiba laptop with the same Intel i7-3610QM CPU), then they understand the same machine codes. Moreover two CPUs with the same ISA (instruction set architecture) understand the same machine codes. Also, newer CPUs are generally backward-compatible with older CPUs in the same series. For example, a newer Intel i7 CPU understands all of the instructions that an older Intel Pentium 4 understands, but not vice-versa.

Hopefully, I've struck a somewhat better balance between simplicity and correctness than the slide, above, which fails miserably.

A minor comment: the *translation* phase can convert high level language statements directly into an object file or machine code, skipping the assembly language phase. Many compilers and translators do this (because of impatient people and build processes). — Thomas Matthews, Jul 25 '14 at 23:31
Sure, but this is the classical breakdown in a form that seems likely to be appropriate to the level of the OP. If I added every “if”, “and”, and “but”, and started talking about translation units and IPO, it would occupy at least one bookshelf rather than merely several paragraphs. — Emmet, Jul 25 '14 at 23:39

Compiling high-level language to machine code

3 Answers3