21

Where can I get some paper/doc/whatever which describes how a Haskell compiler actually works? I read quite a few of the docs of GHC, but stopped after getting a headache. So, something which doesn't require a PhD to understand it and isn't written in the You're-supposed-to-be-already-familiar-with-it style would be preferable. It's not a problem if it's really long and takes some time to understand it though.

PS: Most interesting would be something about GHC, but anything is ok.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • Excellent question. I'd like to know particularly whether CPS transform a la scheme is used or not. I strongly believe it is "the" way to implement functional languages, but I may be overlooking the difficulties. – Alexandre C. Dec 08 '10 at 08:10
  • 2
    If you want a grasp on what CPS is, the excellent "lambda the ultimate goto" paper from the scheme author is a great read. – Alexandre C. Dec 08 '10 at 08:11
  • 1
    Not exactly what you're asking for, but I'd recommend looking at compilers other than GHC to start. JHC's source is extremely readable, and the UHC code has a lot of good theoretical documentation; either of these would be easier going than GHC. – John L Dec 08 '10 at 10:19
  • @John: Good point, I already tried but found out, that you're totally lost as soon as you don't really understand what the programer tries to do. – fuz Dec 08 '10 at 13:08
  • @Alexandre C. Where can I get it? – fuz Dec 08 '10 at 14:28
  • @FUZxxi: you didn't even try google, did you ? – Alexandre C. Dec 08 '10 at 19:49
  • @Alexandre C: FOr the question it self, I did. But not for your comment. I should probably do. – fuz Dec 09 '10 at 01:16
  • The question was answered, but not some comments. To set the record straight: no, @AlexandreC, a CPS transformation is not involved, instead, lambda lifting is used to transform the program to a set of supercombinators to make compiled graph reduction possible. It is possible to argue that there is an equivalence between CPS and the STG. And Scheme, today, counts as an imperative language. No prior PhD is necessary, but the full answer to the question is more work than a PhD (not original, though). – migle Nov 27 '15 at 11:47
  • 1
    One of the best papers on that subject that I have read is: The Glasgow Haskell Compiler by Simon Marlow and Simon Peyton-Jones. The Architecture of Open Source Applications. https://www.aosabook.org/en/ghc.html – The_Ghost Jun 13 '19 at 15:25

6 Answers6

31

You can get an answer from the horse's mouth! Simon Peyton Jones (GHC wizard) wrote a book explaining how to implement functional programming languages. It's available for free online since it's now out of print: http://research.microsoft.com/en-us/um/people/simonpj/papers/pj-lester-book/

Of course, GHC has moved on since the book was written, but it's still very relevant.

Max Bolingbroke
  • 2,089
  • 18
  • 18
  • 4
    And don't forget the stg paper -- "Implementing Lazy Functional Languages on Stock Hardware": http://research.microsoft.com/apps/pubs/default.aspx?id=67083 – sclv Dec 08 '10 at 21:57
  • And the videos linked on the GHC developers' wiki. – Simon Michael Dec 09 '10 at 06:47
17

Are you looking for details especially about compiling lazy-evaluation? There is Simon Peyton-Jones's book mentioned by Max Bolingbroke, also the book detailing Clean's implementation is online:

http://wiki.clean.cs.ru.nl/Functional_Programming_and_Parallel_Graph_Rewriting

If you have a university affiliation and want something smaller you could try to get these books (Henderson & Diller are certainly out of print):

Antoni Diller "Compiling Function Languages" ISBN 0 471 92027 4

Peter Henderson "Functional Programming Application and Implementation" ISBN 0-13-331579-7

AJT Davie "An Introduction to Functional Programming Systems using Haskell" ISBN 0 521 27724 8

Diller has a full compiler for a lazy language (implemented in Pascal) via combinator reduction. This was the implementation technique invented by David Turner for SASL. Henderson has many parts of a compiler for LISPkit a miniature, lazy variant of Lisp. Davie details quite a bit of the machinery for compiling a lazy language, for instance there's a description of the STG thats much shorter than Simon Peyton-Jones's book (the STG is the abstract machine SPJ used for Haskell).

The Clean developers have quite a bit of info on implementing SAPL (a Simple Applicative Language) if you look through their publications list:

https://clean.cs.ru.nl/Publications

Finally there are quite a number of papers documenting aspects of the Utrecht Haskell Compiler UHC (and EHC). I think most of the information is how the compiler is organized (with attribute grammars and "Shuffle") and how the type systems (there are various levels of type system in EHC) are implemented, rather than how the back-end 'compilation' works.

stephen tetley
  • 4,465
  • 16
  • 18
5

Compilers are a huge subject and it would be impossible to explain them in entirety here. But here's an overview for a general compiler. Hopefully this will give you some understanding that may make reading things specifically about GHC a little easier to understand.

Compilers generally work by a series of transformations in 2 parts, the front-end and back-end.

The first transformation is turning plain text into something a little easier to traverse. This itself is usually split into 2 parts:

Lexical Analysis or Tokenization - The act of transforming plain text into small chunks (typically operators, identifiers, literals etc).

Syntactic Analysis or Parsing - Turning these small chunks into a tree structure. (typically an AST, an Abstract Syntax Tree)

The next stage is semantic analysis. In this stage a compiler will usually add information to the AST (like type information) and build a symbol table. That concludes the front-end.

The next transformation transforms the AST into an IR, an Intermediate Representation. This is generally, nowadays an SSA form, a Single Static Assignment.

This is then optimized, via Constant Propagation, Dead code analysis, Vectorisation etc.

The last transformation is code generation. Transforming the IR into machine code. This can be very complicated. It is also sometimes referred to as lowering.

For more information I recommend this wikipedia page.

dan_waterworth
  • 6,261
  • 1
  • 30
  • 41
  • 2
    Thank you for your comment. Please notice that I asked specially for Haskell, where these steps are a bit more sophisticated, eg. Haskell allows new operators with arbitrary precedence which turn parsing into a guesswork, type-inference and classes, which IMHO are making the analysis of what going on quite a bit challenging and a big bunch of optimizations just to get rid of laziness. But thanks for your clear answer! – fuz Dec 08 '10 at 13:05
4

Unfortunately, I suspect that what you're looking for doesn't exist. Compiler theory and Formal Language theory are reasonably complex topics in Computer Science, and Haskell is by no means a starting point.

First, you should probably get a good grounding in:

I would suspect anything explaining anything about the internals of Haskell would require a substantially better understanding of the above topics than say, C would.

I've taken a single course on the subject so far, so I have no formal literature to recommend, but I'm sure there exist many good sources.

Jamie Wong
  • 18,104
  • 8
  • 63
  • 81
  • I guess you're right. It's just that I want to understand what's going on there. IMHO it's easier to understand what your program is doing, if you actually know how it gets compiled. – fuz Dec 08 '10 at 13:06
  • 1
    You answer isn't very helpful. Haskell is so different from mainstream languages. – Wei Hu Dec 08 '10 at 20:32
  • 2
    @FUZxxl, I sympathise with your point of view; with procedural languages this is very much the case, and especially something like C. However with Haskell the distance from language to machine is much greater, so you spend much more time thinking in terms of the language model. The only time this isn't true is when thinking about performance. However understanding the whole compiler isn't a lot of use. Learn about thunks, strictness analysis and the intermediate code emitted by GHC. – Paul Johnson Dec 09 '10 at 00:02
  • @Paul Johnson Isn't thunks, strictness analysis and itermediate code one of the majot parts of the compiler? Probably I should have to ask for this instead of about the whole beast itself. – fuz Dec 09 '10 at 01:18
  • 1
    What I mean is, you need to learn the concepts rather than the algorithms. Its like, the way to understand square roots is to understand that sqrt(x)^2=x, not by reading about Newton-Raphson – Paul Johnson Dec 20 '10 at 12:53
2

Wikipedia has a good - readable - overview of the internals of the GHC (similar to @dan_waterworth's explanation, but specific to Haskell and the GHC):

http://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler#Architecture

amindfv
  • 8,438
  • 5
  • 36
  • 58
0

One of the best papers on that subject that I have read is:

The_Ghost
  • 2,070
  • 15
  • 26
  • Link-only answers are discouraged on this site. If you just want to point out a resource, please make a comment instead. – fuz Jun 13 '19 at 10:46
  • Okay, sorry! Just wanted to help with a good article on that question. I have just added comment with content of this answer. – The_Ghost Jun 13 '19 at 15:26
  • Cool! Thank you for your cooperation. – fuz Jun 13 '19 at 15:48