54

What would be the easiest way to create a C compiler for a custom CPU, assuming of course I already have an assembler for it?

Since a C compiler generates assembly, is there some way to just define standard bits and pieces of assembly code for the various C idioms, rebuild the compiler, and thereby obtain a cross compiler for the target hardware?

Preferably the compiler itself would be written in C, and build as a native executable for either Linux or Windows.

Please note: I am not asking how to write the compiler itself. I did take that course in college, I know about general compiler-compilers, etc. In this situation, I'd just like to configure some existing framework if at all possible. I don't want to modify the language, I just want to be able to target an arbitrary architecture. If the answer turns out to be "it doesn't work that way", that information will be useful to myself and anyone else who might make similar assumptions.

JustJeff
  • 12,640
  • 5
  • 49
  • 63
  • 4
    There is no *fundamental* requirement that the compiler produce assembler: that's just a common and convenient practice. – dmckee --- ex-moderator kitten Jan 01 '12 at 23:15
  • @dmckee - let's assume it's a requirement in this case. I know there are those in this community that would have an answer for this. Please note, the answer for what I am asking is *not* "go take a compiler writing course". – JustJeff Jan 01 '12 at 23:17
  • 2
    PS: Assuming you've got this custom CPU that's so brand-new you don't even have a compiler for it. Where's the *OS* going to come from??? What do you mean "native executable"? Native to *WHAT*?!? – paulsm4 Jan 01 '12 at 23:24
  • 1
    @JustJeff: What you're asking is "how to write my own backend for GCC?" (substitute "clang", etc. for "GCC" if appropriate). This is still non-trivial, and probably the best place to start is with the relevant documentation (e.g. http://llvm.org/releases/2.3/docs/WritingAnLLVMBackend.html). (I'll quite happily admit that I'm out of my depth at this point!) – Oliver Charlesworth Jan 01 '12 at 23:26
  • @paulsm4 - yes, assume it's new. assume no OS. 'native executable' was used to mean 'a binary that executes on the cpu' – JustJeff Jan 21 '12 at 15:04

6 Answers6

34

Quick overview/tutorial on writing a LLVM backend.

This document describes techniques for writing backends for LLVM which convert the LLVM representation to machine assembly code or other languages.

[ . . . ]

To create a static compiler (one that emits text assembly), you need to implement the following:

  • Describe the register set.
  • Describe the instruction set.
  • Describe the target machine.
  • Implement the assembly printer for the architecture.
  • Implement an instruction selector for the architecture.
Community
  • 1
  • 1
Pubby
  • 51,882
  • 13
  • 139
  • 180
9

There's the concept of a cross-compiler, ie., one that runs on one architecture, but targets a different one. You can see how GCC does it (for example) and add a new architecture to the set, if that's the compiler you want to extend.

Edit: I just spotted a question a few years ago on a GCC mailing list on how to add a new target and someone pointed to this

Ricardo Cárdenes
  • 9,004
  • 1
  • 21
  • 34
7

The short answer is that it doesn't work that way.

The longer answer is that it does take some effort to write a compiler for a new CPU type. You don't need to create a compiler from scratch, however. Most compilers are structured in several passes; here's a typical architecture (a lot of variations are possible):

  1. Syntactic analysis (lexer and parser), and for C preprocessing, leading to an abstract syntax tree.
  2. Type checking, leading to an annotated abstract syntax tree.
  3. Intermediate code generation, leading to architecture-independent intermediate code. Some optimizations are performed at this stage.
  4. Machine code generation, leading to assembly or directly to machine code. More optimizations are performed at this stage.

In this description, only step 4 is machine-dependent. So you can take a compiler where step 4 is clearly separated and plug in your own step 4. Doing this requires a deep understanding of the CPU and some understanding of the compiler internals, but you don't need to worry about what happens before.

Almost all CPUs that are not very small, very rare or very old have a backend (step 4) for GCC. The main documentation for writing a GCC backend is the GCC internals manual, in particular the chapters on machine descriptions and target descriptions. GCC is free software, so there is no licensing cost in using it.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
6

vbcc (at www.compilers.de) is a good and simple retargetable C-compiler written in C. It's much simpler than GCC/LLVM. It's so simple I was able to retarget the compiler to my own CPU with a few weeks of work without having any prior knowledge of compilers.

dsula
  • 185
  • 1
  • 2
  • 8
  • Interesting option. Usually people don't even think of free compilers outside of the gcc/clang/llvm family. Can you elaborate a little for the OP on how the retargeting process was done? (e.g., is there a clearly defined intermediate "generic machine code" stage from which you simply write a more or less direct translator into the real CPU instruction set?) – dodgethesteamroller Sep 18 '15 at 20:37
  • 1
    Yes, the VBCC compiler front-end outputs a generic machine code of sorts. It's calling back-end function (the ones you have to write) to translate those instructions into your target assembly instructions. The compiler is fairly powerful and offers good optimization. It takes very little time to get a functioning (although not very optimizing) backend going. If you're goal is to achieve best code possible, then it get's a bit harder. – dsula Oct 17 '17 at 11:30
2

1) Short answer:

"No. There's no such thing as a "compiler framework" where you can just add water (plug in your own assembly set), stir, and it's done."

2) Longer answer: it's certainly possible. But challenging. And likely expensive.

If you wanted to do it yourself, I'd start by looking at Gnu CC. It's already available for a large variety of CPUs and platforms.

3) Take a look at this link for more ideas (including the idea of "just build a library of functions and macros"), that would be my first suggestion:

http://www.instructables.com/answers/Custom-C-Compiler-for-homemade-instruction-set/

paulsm4
  • 114,292
  • 17
  • 138
  • 190
1

You can modify existing open source compilers such as GCC or Clang. Other answers have provided you with links about where to learn more. But these compilers are not designed to easily retargeted; they are "easier" to retarget than compilers than other compilers wired for specific targets.

But if you want a compiler that is relatively easy to retarget, you want one in which you can specify the machine architecture in explicit terms, and some tool generates the rest of the compiler (GCC does a bit of this; I don't think Clang/LLVM does much but I could be wrong here).

There's a lot of this in the literature, google "compiler-compiler".

But for a concrete solution for C, you should check out ACE, a compiler vendor that generates compilers on demand for customers. Not free, but I hear they produce very good compilers very quickly. I think it produces standard style binaries (ELF?) so it skips the assembler stage. (I have no experience or relationship with ACE.)

If you don't care about code quality, you can likely write a syntax-directed translation of C to assembler using a C AST. You can get C ASTs from GCC, Clang, maybe ANTLR, and from our DMS Software Reengineering Toolkit.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341