Writing languages for the JVM

Question

Suppose I write a programming language; for namesake, I'll call it lang.

To begin the long journey of writing lang, I decide to begin, by writing lang in itself. I can't actually run it, because theres nothing to run the program that runs itself.

So I begin by writing another compiler for lang in Java. This time, when I am done, I decide to convert it to Bytecode, and leave it at that. I now have a working compiler, which will convert all my lang code into Bytecode.

So I decide to plug in my self-compiler for the language, into the compiler I just made in Java. I then convert the self-compiler to Bytecode, and chuck out the Java compiler. I now have a lang compiler, purely written in itself, converted into Bytecode, ready for use.

This creates a solid program, and I understand all of this, but my question is, relative to compiler design for the JVM, what if I decide to release an update for my language? How do I go about updating the Bytecode? Do I simply re-write the updated version of the language in the older one?

I ask this because this is what I want to do. Write a non-existing language in itself, and then bootstrap it to the JVM by firstly creating a compiler in Java.

It's the same as what was done with C++. C with Classes was written, and then C++ in it, and finally C with Classes was abandoned for the bootstrapped C++. But then how on earth did they ever go about updating the language?

Debugging byte code can be pretty painful. I suggest you write a translator of `lang` into `java` This way you c an see what it is doing and debug the translated Java. Later you can write a more efficient compiler straight to byte code. — Peter Lawrey, Feb 16 '17 at 11:37
@Peter Lawrey: “Debugging byte code” is rarely needed. If your generated class file has debug attributes specifying the source file name and mapping of instructions to line numbers, you can debug it as smooth as Java source code, regardless of which language it actually was written in. — Holger, Feb 16 '17 at 16:12
If you upgrade your language, you have to implement the newer features using the features of the previous language only. Only after the completion of that, you can start using the new features within the compiler. — Holger, Feb 16 '17 at 16:15

score 3 · Accepted Answer · answered Feb 16 '17 at 11:12

I'll answer this from two possible scenarios in your development. With any byte-code language at any time you can update the virtual machine or the language.

Suppose first you wanted to update your language to have new syntax or change the current semantics. Then you'd keep your current compiled compiler written in lang (compiler A) and edit its source so that it can correctly compile your new features. Then you compile your compiler using the old one giving you compiler B. If necessary, you can now rewrite the compiler to use the new features and then compile it using compiler B to give you compiler C.

What if the JVM changes? Well in that case you keep an old version of the JVM around, adjust your compiler to cope with the new bytecode changes, and then compile it with the old one (this is analogous to compiler B from before). That will get you a compiler that compiles to the new bytecode but runs on the old VM. The next step is get it to compile itself, and now you have a new compiler that runs on the new VM (analogous to compiler C).

score 0 · Answer 2 · answered Feb 16 '17 at 11:10

0

I don't think your compiler is the best way to go about this.

I'd start with a grammar for my language.

Next comes the lexer/parser to turn expressions in my language to an abstract syntax tree (AST). The AST is a correct intermediate representation of the expression.

You would emit bytecode or assembly language instructions for the virtual machine or processor of your choice by writing a code generator that traverses the AST.

Where does your update happen?

If it's language fundamentals, you have to modify both the grammar and the bytecode emission.

If you're optimizing the bytecode or porting to a new processor you have to modify the code generator.

answered Feb 16 '17 at 11:10

duffymo

305,152
44
369
561

Thankyou! However when you specify language fundamentals, as I modify the bytecode emission, should I be writing the emitter in Java, or the previous version or a commit of the language in question? – Feb 16 '17 at 11:12
A grammar also is a good basis for a language reference. However then you tie yourself to the java parser implementation language until the parser is written in lang. But +1 – Joop Eggen Feb 16 '17 at 11:14
I didn't say it had to be a Java parser. Why cares if the parser is written in the original language? Use bison if you wish. – duffymo Feb 16 '17 at 12:46

Joop Eggen · Answer 3 · 2017-02-16T11:24:46.463

0

The first lang compiler can be written in a subset of lang. And you only need a subset (bootstrap) compiler (or even interoreter). This can be written in java.

Later, more extensive compilers can be written in lang. Newer versions can do too.

You could even write a translator that converts a lang program to java, and use that to create a first translator in lang, and then turn it into a bytecode compiler.

edited Feb 16 '17 at 11:24

answered Feb 16 '17 at 11:12

Joop Eggen

107,315
7
83
138

Writing languages for the JVM

3 Answers3