Creating a C++ compiler/linker for a homemade opcode list

Question

Is there a way how I can tell a C++ compiler/linker to compile the source code into my own homemade opcode list? I need it for my virtual machine which will execute on a microcontroller.
I don't want to create a C++ compiler from scratch, but only change the opcodes, addresses of CPU status register, stack pointer and GPIO registers, program memory and data memory from an existing compiler that is open source so that people making programs for it don't have to rewrite the whole code, but just port it using the libraries that are compatible with my own compiler's libraries.
Example is an avr-gcc compiler.
The compiler and its libraries must not be proprietary in the way that I or any programmer have to pay for it and I don't want it to be either GPL in such way that a programmer must reveal source for their own projects. I want all my programmers to freely use my compiler, be free to license their work in whatever way they want as well as choose to make it open source or proprietary.

Look at LLVM. (As for the licence: GCC is GPL, that doesn't force the GPL onto code compiled with it.) — Mat, Jan 18 '15 at 14:41
I'm at the downloads. What should I choose first? How do I start? — Foxcat385, Jan 18 '15 at 14:53
Start with the documentation, and once you've got an idea about what it's all about, look deeper into the "backends" documentation. (What you're trying to achieve is a **lot** of hard work, btw.) — Mat, Jan 18 '15 at 14:55
You need to take a two year course on compiler design and computer architecture! — Lightness Races in Orbit, Jan 18 '15 at 15:37
I disagree; I think it's entirely possible to retarget an existing compiler without a deep understanding of compiler theory - for instance, you don't need to know squat about lexing, parsing, IR design or optimization. However, a through understanding of the target architecture is of course required. :-) — Martin Törnwall, Jan 18 '15 at 16:06
What CPU does your microcontroller use? There is probably already a compiler for it. — brian beuning, Jan 18 '15 at 16:07
My microcontroller is AVR UC3. It will have a kernel which will load programs as files from an SD card and JIT them into RAM where all addresses are relative. I'm doing this because the microcontroller doesn't have virtualization so I'm trying to implement it myself and I think it will be the best way to make my own virtual machine code which the kernel simply compiles to the microcontroller's machine code with all JMP and CALL and branches and function pointers depending on where the kernel allocates the destinations of those JMPs and CALLs and branches and function pointers. — Foxcat385, Jan 18 '15 at 16:29

Martin Törnwall · Answer 1 · 2015-01-18T15:22:10.777

Let's consider the steps involved:

Retargeting an existing C++ compiler: Several production-quality, retargetable C++ compilers are freely available today. For instance, the LLVM platform (clang++) provides some pointers on writing a backend for a new hardware architecture (this naturally applies to VM's as well!). Unfortunately, up-to-date documentation on porting the GNU compilers is harder to come by. It's entirely possible that many of the older documents remain relevant today, but I know far too little about GCC to say.

Note that the effort required to retarget either compiler is likely to depend on how well the instruction set of your virtual machine matches the compiler's low-level intermediate representation. Since they often (at least semantically) take the form of three-address code ― that is, instructions with two source operands and one destination ― writing a code generator for, say, a stack machine (in which all operands are implicitly addressed) could prove to be a bit more difficult.

From this point on, you really have two options. You could stick to the conventional way in which C++ programs are compiled, i.e., from source, to assembly, to object files, to linked executable or library. That involves going through the steps I have outlined below. But since you are targeting a virtual machine, it may have requirements that are radically different from those of modern hardware architectures. In that case, you may want to steer clear of existing software like binutils and roll your own assembler and linker.

Writing or porting an assembler: Unless your chosen compiler is able to directly generate machine code, you will most likely also need to write an assembler for your virtual machine, or port an existing one. If your virtual machine's instruction set looks anything like that of a modern machine, and if you want to use the standard C++ compilation/linking pipeline, you could look into porting binutils, specifically gas, the GNU assembler.

Writing or porting a linker: The object files produced by your assembler are not in themselves executable programs. Addresses must be assigned to symbols and segments, and references between object files must be resolved. This means that the linker needs some understanding of your instruction set. In particular, it must be able to find and patch locations in code and data that address memory. The binutils porting guide I linked above is relevant here, too; you may also enjoy reading Linkers and Loaders.

As @Mat noted in the comment section above, the GPL doesn't usually "infect" the output of a program licensed under it. See this section. Notably:

The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work.

I am not a lawyer, but I take this to mean that an exception would be made for, say, compiling the compiler with itself ― the output would still be subject to the terms of the GPL.

The virtual machine code is something like "my own Java" which the microcontroller's kernel compiles into its own machine code. The reason why I don't use the already existing compiler for that microcontroller is because the compiler cannot compile the program in such way that the program's addresses are relative while the programs are loaded all over the RAM. Tl;dr, it doesn't have virtualization so I'm trying to implement it myself. — Foxcat385, Jan 18 '15 at 16:33
Yeah, but what if I have function pointers and the values of those pointers are fixed to an absolute address and not the relocated? I'd need to make the compiler use function IDs for every function and compile the list of trampolines with these "JMP function_address". So that while jumping on a function from a pointer, it is a jump to label "TrampolinesStart:" incremented by JMP instruction size * function ID. Still, that needs a change in the compiler. Also, I would need to remove the interrupt vectors from the loaded program by not compiling them because they're never used. — Foxcat385, Jan 20 '15 at 15:43

Creating a C++ compiler/linker for a homemade opcode list

1 Answers1