2

I have been wanting to create my own programming language and I am looking to start writing a basic compiler. I am doing this purely for learning purposes. I will be writing the compiler in C#.

I have been trying to decide whether or not to generate IL or another high-level language. From articles and tutorials I have seen/read it seems C and MSIL (by way of reflection.emit) are the most popular.

I am wondering which approach will make my programming language faster? (assuming they were implemented optimally). Ideally I'd like the language to be able to run on both MS and Linux/OSX - I also understand that there may be better alternatives out there I am not considering

user3574076
  • 711
  • 1
  • 6
  • 23
  • 1
    Compilers don't generate high level languages. What you are describing is a conversion tool. – Ron Beyer Aug 13 '15 at 02:01
  • @RonBeyer We can see that the term of "compiler" is extended nowadays: https://github.com/Microsoft/TypeScript: "TypeScript is a superset of JavaScript that compiles to clean JavaScript output" – Nipheris Aug 13 '15 at 02:03
  • 5
    @RonBeyer incorrect, compilers convert one language into another. – Keith Nicholas Aug 13 '15 at 02:05
  • @KeithNicholas Please read what I wrote closely. A compiler does not turn code into another **high level language**. A compiler generates something like IL, assembly, etc. Something that has one more step to machine code. A compiler isn't a tool that generates something that gets fed into another compiler, thats a source converter. – Ron Beyer Aug 13 '15 at 02:25
  • Yes there are S2S "compilers" but it's not the more common definition of the term compiler... Getting of topic here though. – Ron Beyer Aug 13 '15 at 02:36
  • 1
    @RonBeyer There's decidedly more than one step to convert CIL to machine code, just as there is for Javascript (which is becoming a fairly popular target for cross-compilation). There's also plenty of history for languages that generate C as an intermediate langauge (http://programmers.stackexchange.com/a/257873/37027). – Preston Guillot Aug 13 '15 at 02:36
  • 2
    @RonBeyer read closely what I'm saying, a compiler converts code into another language. That other language can be any other language, it's typically a lower level language but it doesn't have to be, if it goes high level to high level its typically called a transpiler ( which is a classification of the type of compiler). but it doesn't matter what the source or target language is (but depending on their level they may have a more specific name, but are all compilers). – Keith Nicholas Aug 13 '15 at 02:41
  • @RonBeyer My understanding was pretty much what Keith Nicholas wrote above me. If you are correct though, does that mean you consider Cfront a conversion tool? Also the Microsoft blog in Israel has a series of articles that build a compiler that generates C : [link](http://blogs.microsoft.co.il/sasha/tag/compiler/). – user3574076 Aug 13 '15 at 14:34
  • 2
    I can swallow my pride and say I was wrong, personally when I think of "compiler" I think of something that is generating lower level code for the interpreter or a linker/assembler, but yes the accepted definition of compiler seems to be a translation tool. – Ron Beyer Aug 13 '15 at 14:39

1 Answers1

3

Your decision generaly depends on the design and paradigms of your language. If your language will be small and will not include complex object-oriented features, than only non "object-oriented" features of IL will be used, and the difference is about:

  1. The availability of.NET virtual machine and BCL vs C standard library for purpose of language implementation. This includes the memory management capabilities and implementation of primitive types, such ints and strings.
  2. The code generation: stack-based IL vs high-level C syntax. Of course, it can be easier to generate high-level constructs of another language (you should not embrace all the grammar of C, you can just use what you need), but for learning puproses it is more useful to learn how to generate low-level instructions like IL opcodes. And don't forget: it will be cool, if you split your tool into frontend and backend, as it is done in every solid compiler. Than you can just use different backends for code generation.

PROS for IL:

  • more solid learning process and the complete result: your compiler will not require any other tools and will be self-sufficing;
  • the presense of BCL and resource-management layers in CLR;
  • ability to bootstrap your compiler by interaction with C# code.
  • the unique experience with .net platform - the useful thing if you plan to raise your C# and .net skills.

PROS for C:

  • the ability to utilize existing backends to generate platform code and to perform optimizations; you can compile your C output for every platform C compiler can;
  • absense of the dependency from CLR: you will not need .net fw or Mono to run the produced output. Today Mono is mature thing and is running both on Mac and Linux, but it is always the choice: IL or platform code.

A lot of modern languages compile to another high-level languages (oh god, there is tons of something-to-js tools today!), and some of the languages is even DESIGNED to be compiled to another high-level language (CoffeeScript to JavaScript), but don't forget that you have another options too, for example, LLVM intermediate representation.

Nipheris
  • 487
  • 5
  • 12