Inside a bytecode (P-CODE) compiler

Question

Does a bytecode compiler first convert the source code into VM assembly language which is then converted to VM bytecode?

e.g

[Source Code] --> { [VM Bytecode Compiler (lex, parse into AST)] --> [AST to VM assembly] } --> [VM Assembler] --> [VM Bytecode]

I have thought about the design of a VM and I cannot complete the theoretical design without using a bytecode assembler.AFAIK , a bytecode assembler is something that you cannot do without.What are some of the use cases for a bytecode assembler that you can think of? — DeLorean, Oct 11 '13 at 13:49
Well... you can generate bytecode directly (not VM assembly and then -> bytecode). — dbrank0, Oct 11 '13 at 16:49
I believe if you do not use the VM assembly stage during bytecode compilation then you can only use the VM for only one type of source language.To get more out of your VM (to make it language agnostic) it would be better to think up a good Pseudo-Assembly language.I stayed up late last night researching on this issue and this conclusion is the best I could come up with. Unless someone thinks differently and is willing to share? — DeLorean, Oct 12 '13 at 07:48
There are even C -> x86 machine code compilers that do not use the intermediate step of creating assembler code. — Martin Rosenau, Oct 12 '13 at 11:24

score 0 · Accepted Answer · answered Oct 15 '13 at 12:33

0

It seems that you have the wrong impression of what an assembly language is. An assembly language is mostly a human-readable representation of a certain machine code. For a compiler there is no advantage of creating a human readable intermediate representation of the code to generate.

For Java there is no standard assembly representation at all, though there are tools dealing with pseudo-Java-assembly languages that look close to each other due to the fact that they all use the same well-kown names from the JVM specification. So, for example, you can translate all .class files into a pseudo-assembly source with the javap command shipped with the JDK. But that does not add any advantage to the tools, neither to javac nor to other programming language compilers.

Most compilers have some sort of intermediate representation while compiling but that’s not an assembly language representation.

answered Oct 15 '13 at 12:33

Holger

285,553
42
434
765

I read somewhere that it all comes down to how the virtual machine is intended to work eventually , you can have an assembly language interpreter built into the VM such that source code is mapped to various assembly menemonics then the mnemonics are executed by the assembly interpreter or have instead a bytecode interpreter where the source language is mapped to 2 byte sized opcodes that you used to encode your pseudo instructions set such that all you need is a bytecode interpreter.So now the question is , which one is faster? assembly or bytecode interpreter? – DeLorean Oct 17 '13 at 06:38
@DeLorean: the assembly language is a *source code* artifact. It will always get translated to a binary code before execution, in case of the JVM the binary code is the byte code. By the way, the name “byte code” reflects the fact that the opcodes are *single bytes* not two byte words. Other VMs might have different opcode sizes but you shouldn’t trust a book which tells you there were a fixed opcode size for all kind of VMs. – Holger Oct 17 '13 at 07:23
I asked the question with reference to an application VM , perhaps I should have mentioned that before.This makes it plausible to have the assembly instructions interpreted directly by the VM (no binary code execution needed here) AFAIK.To turn the pseudo-assembly to bytecode then I will eventually need an assembler to do that for me, where I will change the underlying design of the VM to execute bytecode instructions via a bytecode interpreter. – DeLorean Oct 17 '13 at 10:21
Well, today almost any kind of scripting language implementation will “interpret” the script by pre-compiling it into a binary form at loading time right before execution, some might even cache the compiled form on hard drive. There’s no benefit in interpreting without compilation even if the script is executed exactly one time but much to lose if it is executed multiple times or has loops. The same holds true for an assembly language like scripting language as well (I wouldn’t call it “assembly language” anymore if it’s interpreted). – Holger Oct 17 '13 at 12:00
So for practical, real-life VMs your question comes down to the question whether the assembler is inside the VM or an external tool. Since high level language compilers do not need an assembler, letting it be an optional external tool is the better choice. – Holger Oct 17 '13 at 12:03

Inside a bytecode (P-CODE) compiler

1 Answers1