11

Does the Just In Time Compiler(JIT) really map each of the Common Intermediate Language(CIL) instructions in a program to underlying processor's opcodes?

And If so can we call CIL an assembly language and JIT an assembler

Note: Wikipedia doesn't list CIL as an assembly language in its list of assembly languages

StayOnTarget
  • 11,743
  • 10
  • 52
  • 81
Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • 1
    interesting question, I tryied to reply, but it is not so easy. I think you can't consider it an assembly language since tehre is no real cpu running it directly. – Felice Pollano Jul 28 '12 at 12:41
  • @FelicePollano then CIL maybe a partial assembly language..:) – Anirudha Jul 28 '12 at 12:47
  • 4
    Assembly language mnemonics correspond 1:1 with CPU specific machine code instructions. An assembler just maps the (sorta) human-readable assembly code to those instructions. This is definitely not the case with CIL. It's not partial, it just isn't - assembly language has a very clear definition. – Jamie Treworgy Jul 28 '12 at 13:04
  • @jamietre you are right but then y people call it an Object Oriented Assembly language – Anirudha Jul 28 '12 at 13:09
  • I haven't heard it called that before, I think object-oriented bytecode would be more accurate – Jamie Treworgy Jul 28 '12 at 13:10
  • 1
    I'm not even sure if "object-oriented" is that important for CIL (even though the CLI's architecture clearly favours the OO paradigm). Its stack-based evaluation model is much more prominent, as is the emphasis on providing metadata besides bytecode. your typical assembly language wouldn't care about metadata at all. – stakx - no longer contributing Jul 28 '12 at 13:49
  • @stakx I'd say that having distinct static, instance and virtual methods and even an instruction specifically for virtual calls does make OO quite prominent. – svick Jul 28 '12 at 13:51
  • IL == Intermediatary Language, so no. – leppie Jul 28 '12 at 15:59

4 Answers4

10

This question is all about definitions, so let's define the terms properly. First, assembly language:

Assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices in which each statement corresponds to a single machine language instruction. An assembly language is specific to a certain computer architecture, in contrast to most high-level programming languages, which generally are portable to multiple systems.

Now, CIL:

Common Intermediate Language is the lowest-level human-readable programming language defined by the Common Language Infrastructure (CLI) specification and is used by the .NET Framework and Mono. Languages which target a CLI-compatible runtime environment compile to CIL, which is assembled into an object code that has a bytecode-style format.

Okay, this part is technically not correct: for example C# compiler compiles directly to the bytecode, it doesn't go through CIL (the human-readable language), but theoretically, we can imagine that's what's happening.

With these two definitions, CIL is an assembly language, because each statement in it is compiled down to a single bytecode instruction. The fact that there is no physical computer that can execute that bytecode directly doesn't matter.

The definition says that each assembly language is “specific to a certain computer architecture”. In this case, the architecture is the CLR virtual machine.


About JIT: the JIT compiler can't be considered an assembler: it doesn't do the 1:1 translation from human-readable form to bytecode, ilasm does that.

The JIT compiler is an optimizing compiler that compiles from bytecode to native machine code (for whatever ISA / CPU it's running on), while making optimizations.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
svick
  • 236,525
  • 50
  • 385
  • 514
  • Virtual machines perform the JIT compile to specific machine instructions (x86, x64, ia64, etc). – Peter Ritchie Jul 28 '12 at 13:54
  • Part of the OPs question asked about the JIT. – Peter Ritchie Jul 28 '12 at 14:33
  • @PeterRitchie Right, thanks, I have added a short paragraphs about JIT. – svick Jul 28 '12 at 15:26
  • @svick ilasm generates a portable executable (PE) which contains MSIL and the required metadata..How can it be a assembler! – Anirudha Jul 28 '12 at 16:11
  • @Anirudha Strictly speaking, the generated file doesn't contain CIL, it contains CIL *bytecode*. Which is exactly what assembler (the program) is supposed to do: translate a human-readable program 1:1 into a machine-readable code. Assembler doesn't have to compile to x86 machine code. – svick Jul 28 '12 at 16:15
  • @Anirudha Also, why exactly do you think that has to mean it's not an assembler? Which part of the definition of assembly language I gave above does that contradict? Or do you disagree with the definition? – svick Jul 28 '12 at 16:16
  • @svick how can we use ilasm to produce x86 and x64 machine code.do we need to use some options while using ilasm – Anirudha Jul 28 '12 at 16:17
  • 2
    @Anirudha What makes x86 machine code special? Assembler doesn't have to produce x86 code, for example the ARM also doesn't do that. And for example the some C compilers do produce x86 machine code, but they are not assemblers. Being assembler doesn't have anything to do with x86 machine code. – svick Jul 28 '12 at 16:21
  • Great answer. +1 for starting with terminology and work towards a logical conclusion from there. Perhaps it would be good to say sth. about how the additional generation of metadata besides instructions affects `ilasm` being an assembler (and thus CIL an assembly language), according to the terms' definitions...? – stakx - no longer contributing Jul 28 '12 at 17:15
4

Assembly is made up of mnemonics for the machine code instructions of a particular processor. A direct representation of the 1s and 0s that make the core execute code, but written in text to make it easy on a human. Which is very unlike CIL:

  • you can't buy a processor that executes CIL
  • CIL doesn't target a specific processor, the jitter does
  • CIL assumes a stack-based execution model, processors are primarily register based
  • CIL code is optimized from its original form
  • there is no one-to-one translation of a CIL instruction to a processor instruction

That last bullet is a key one, a design decision that makes CIL strongly different from bytecode is that CIL instructions are type-less. There is only one ADD instruction but processors have many versions of it. Specific ones that take byte, short, int, long, float and double operands. Required because different parts of the processor core are used to execute the add. The jitter picks the right one, based on the type of the operands it infers from previous CIL instructions.

Just like the + operator in the C# language, it also can work with different operand types. Which really make the L in CIL significant, it is a Language. A simple one, but it is only simple to help make it easy to write a jitter for it.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 4
    Why does it matter that a physical processor exists? What if someone makes a physical processor for the CIL bytecode in the future, will that suddenly make CIL an assembly language? Also, does this mean that [MIXAL](http://en.wikipedia.org/wiki/MIX) is not an assembly language? – svick Jul 28 '12 at 15:59
  • Such a "what if" game isn't very productive. Fact is that there is no such processor, last part of my answer pointed out a likely reason why we still don't have one. Even Java isn't there, Jazelle doesn't execute all byte codes. I'll promise that as soon as I have one in my machine that lets me post to SO then I'll use it edit this answer. Could happen, let's see what Midori produces. – Hans Passant Jul 28 '12 at 16:19
  • My point is that a definition that relies on the existence of some specific piece of hardware is silly. Specific hardware doesn't make an assembly language, the mechanics of its compilation do. – svick Jul 28 '12 at 16:24
  • Hmm, no it does in the case of assembly. Your example of MIXAL requires an *emulator*, a chunk of software that emulates hardware. Knuth's fictitious processor in this case. That's only ever of academic interest (students learning MIPS for example), or interesting to execute ROMs of old games, emulators are too slow to ever be useful in general purpose computing. – Hans Passant Jul 28 '12 at 16:36
  • 5
    So you're saying that MIX Assembly Language is not an assembly language and that Knuth doesn't know what “assembly language” means? – svick Jul 28 '12 at 16:48
  • Emulators execute the 1s and 0s produced by an assembler. I don't particularly enjoy the direction this comment trail is heading, let's call it quits. – Hans Passant Jul 28 '12 at 17:03
2

The line is actually pretty blurry... the arguments I've seen against calling CIL an "assembly language" can apply almost as well to x86/x86-64 in practice.

Intel and AMD haven't made processors that execute assembly instructions exactly as emitted in decades (if ever), so even so-called "native" code is not much different from running on a virtual machine whose bytecode is specified in x86/x86-64.

x86/x86-64 are the lowest-level thing typical developers have access to, so if we had to put our foot down and call something in our ecosystem an "assembly language", that would win, and since CIL bytecode ultimately requires x86/x86-64 instructions to be able to run on a processor in that family, then there's a pretty strong case to be made that it indeed doesn't "feel" like it should count.

So in a sense, maybe neither can be considered to be "assembly language". When referring to x86/x86-64 processors, we almost never refer to processors that execute x86/x86-64 without translating it into something else (i.e., whatever the microcode does).

To add in yet another wrinkle, the way in which an x86/x86-64 processor executes a given sequence of instructions can change simply by updating the microcode. A quick search shows that Linux can even make it easy to do this yourself in software!

So I guess, here are criteria that can justify putting them in two separate categories:

  1. Does it matter that all current machines that run CIL bytecode are implemented in software?
  2. Does it matter that the same hardware can interpret the same x86/x86-64 instructions in a different way after being instructed to do so in software?
  3. Does it matter that we don't currently have a way of bypassing the microcode and issuing commands directly to the physical units of x86/x86-64 processors?

So regarding the "is CIL an assembly language` question, the best answers I can give are "it depends" (for scientists) and "pretty much" (for engineers).

Joe Amenta
  • 4,662
  • 2
  • 29
  • 38
1

The CIL is more a bytecode than an assembly language. In particular, it is not a textual human readable form, unlike assembler languages (Probably CIL also defines the format of bytecode files).

The MSIL JIT is an implementation of a virtual machine for that bytecode. How implementations (from Microsoft or from Mono) translate CIL into machine code is an implementation detail which should not really matter to you (and given that Microsoft VM is probably proprietary, then won't tell you how it is done). I think that Mono -a free software implementation of CIL- is using LLVM so probably don't translate each bytecode at a time but probably entire methods or functions.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547