31

Let's assume for the moment that C++ is not a functional programming language. If you want to write a compiler using LLVM for the back-end, and you want to use a functional programming language and its bindings to LLVM to do your work, you have two choices as far as I know: Objective Caml and Haskell. If there are others, then I'd like to know about those too.

I'm not asking for subjective opinions, so please don't give this the subjective tag. I want to make up my own mind about this, but I'm not sure I know what are all the trade-offs. So, StackOverflow to the rescue. What are the trade-offs?

james woodyatt
  • 2,170
  • 17
  • 17
  • 23
    "Let's assume for the moment that C++ is not a functional programming language." It *never* was one. – R. Martinho Fernandes Nov 20 '09 at 23:44
  • 27
    That was a joke. – james woodyatt Nov 21 '09 at 00:09
  • dons comments below leads me to this weblog post about Haskell LLVM bindings, which goes over a lot of what I wanted to know from the Haskell side: http://augustss.blogspot.com/2009/01/llvm-llvm-low-level-virtual-machine-is.html – james woodyatt Nov 25 '09 at 22:24
  • 1
    So, one of the things I've noticed now that I'm getting deeper into this question is that the OCaml and Ada bindings that are distributed with LLVM are basically just thin wrappers over the LLVM C-language subset. The Haskell bindings, written by my friend Bryan O'Sullivan, are enriched with a very Haskell-specific API on top of the FFI wrappers to the LLVM C-language subset. – james woodyatt Dec 27 '09 at 19:18
  • Alas, there are things missing (as of 0.7.0.0) from the enriched Haskell API, e.g. structure types (!). On the other hand, the OCaml bindings are bundled with LLVM, but that doesn't seem to imply anything about the quality of their synchronization with the principle LLVM API. I have found bugs that I would have expected unit tests to catch. Sadly, no. – james woodyatt Dec 27 '09 at 19:18
  • 1
    Of course, http://conal.net/blog/posts/the-c-language-is-purely-functional – dubiousjim Dec 23 '11 at 07:41

4 Answers4

14

Either OCaml or Haskell would be a good choice. Why not check out the LLVM tutorials for each language? The LLVM tutorial for OCaml is here: http://llvm.org/docs/tutorial/OCamlLangImpl1.html

Haskell has more momentum these days, but there are plenty of good parsing libraries for OCaml as well including the PEG parser generator Aurochs, Menhir, and the GLR parser generator Dypgen. Also check out this presentation on pcl a monadic parser combinator library for OCaml (like Parsec for Haskell) there's some good info in there comparing Haskell's and OCaml's approach: http://osp.janestreet.com/files/pcl.pdf

Some will say that laziness gives Haskell the edge in parsing, but you can get laziness in OCaml as well.

aneccodeal
  • 8,531
  • 7
  • 45
  • 74
  • 2
    Is there a Haskell/LLVM tutorial somewhere? – james woodyatt Nov 21 '09 at 22:54
  • Oh, and the monadic parser combinator in my OCNAE Cf library is my preferred parsing solution in OCaml. I looked at Parsec and was disappointed. If I go with Haskell, I'll probably port my Cf library over to it before I do anything else. – james woodyatt Nov 21 '09 at 22:59
  • 3
    @james http://augustss.blogspot.com/ has a long series of posts on the LLVM bindings, including examples. There's several other users of the LLVM bindings (a prototype GHC backend, and a lambda calculus compiler, http://blog.finiteimprobability.com/2009/11/17/a-compiler-for-lambda-calculus-to-llvm-part-1/ – Don Stewart Nov 22 '09 at 19:01
  • @dons thanks! that's the kind of information i was hoping to elicit! – james woodyatt Nov 25 '09 at 22:09
10

Haskell has higher level bindings to LLVM than OCaml (the Haskell ones provide some interesting type safety guarantees) and Haskell has by far more libraries to use (1700 packages on http://hackage.haskell.org) making it easier to glue together components.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
7

Availability of native bindings need not constrain your choice of language. There is a third option, apart from using bindings or generating IR text directly:

You can use a language-neutral serialization format, such as Google's Protocol Buffers, to serve as the bridge from your front-end to your back-end. Protocol buffers are, after all, just ASTs in disguise.

Your front end, implemented in a functional language, then does what it is best at -- parsing, type checking, desugaring, core-to-core transformations, etc -- and the C++ backend takes the IR from your frontend and uses LLVM's feature-complete-by-definition native C++ API to do lowering from your-language-IR to LLVM IR. This makes it much easier to handle "advanced" features of LLVM such as debug metadata.

I'm using this strategy with hprotoc and associated Haskell bindings for protocol buffers, and am very happy with the results. There is much to be said for using the right tool for the job!

Ben Karel
  • 4,591
  • 2
  • 26
  • 25
  • Wow. That's an idea that would never have occurred to me. Now that I've been exposed to it, I'm not sure I feel safe assuming I know what you mean when you say "there is much to be said for using [this strategy]." Could you elaborate? – james woodyatt Feb 09 '11 at 04:53
  • James, I'm not 100% sure what you're asking for elaboration on... But the very last sentence was just a general observation that some jobs are best solved using more than one tool, and there's nothing wrong with that. I think people sometimes forget that using more than one language is workable! – Ben Karel Feb 09 '11 at 15:46
6

OCaml is the only functional language with bindings in the LLVM distro itself and documentation on llvm.org such as the Kaleidoscope tutorial. If you have OCaml installed when you build and install LLVM then it will automatically build and install the LLVM bindings for OCaml as well. Moreover, these OCaml bindings have been in use for years so they are mature and reliable.

I have been developing HLVM in OCaml using the standard LLVM bindings and found OCaml+LLVM to be an extremely powerful combination. HLVM provides tuples, arrays, unions, TCO of all tail calls, generic printing, FFI to C, JIT compilation and parallel garbage collection with a VM weighing in at under 2kLOC of OCaml code that took only a few man-weeks to develop from scratch. HLVM's numerical performance already far exceeds that of today's fastest open source FPLs including OCaml itself. I have published articles in the OCaml Journal describing how LLVM can be used from OCaml for everything from basic expression evaluation to advanced topics such as parallelism and garbage collection. You may also like this mini example.

J D
  • 48,105
  • 13
  • 171
  • 274
  • But OCaml is not the only functional language with bindings to LLVM. Haskell can be compiled to LLVM (http://www.cs.uu.nl/wiki/bin/view/Stc/CompilingHaskellToLLVM). OCaml bindings just happen to be shipped as an example. – sastanin Dec 28 '09 at 19:52
  • 1
    @jetxee: No, the OCaml bindings are not just shipped as an example. They are used in industrial LLVM-based projects to solve real problems. AFAIK, the Haskell bindings are incomplete, immature and nothing of any significance has ever been done with them. The Haskell->LLVM compiler you cite actually uses a textual interface instead of binary bindings to generate LLVM IR. – J D Dec 29 '09 at 20:14
  • 2
    The LLVM bindings are used commercially in software Lennart has written for his employer - in fact, as the code generation layer for a functional programming language. I'm sure they are very robust, though they may indeed be missing some esoteric features. – Max Bolingbroke Feb 11 '11 at 15:32
  • @Max: Interesting. Do you know which of the LLVM bindings Lennart used? – J D Feb 11 '11 at 19:44
  • 1
    The ones he contributed to, which I think are the canonical Haskell bindings to LLVM. See http://hackage.haskell.org/package/llvm – Max Bolingbroke Feb 14 '11 at 17:27