11

I'm currently looking for a lexer/parser that generates Scala code from a BNF grammar (an ocamlyacc file with precedence and associativity). I'm quite confused since I found almost nothing on how to do it.

For parsing, I found scala-bison (that I have a lot of trouble to work with). All the other tools are just Java parsers imported into Scala (like ANTLR).

For lexing, I found nothing.

I also found the famous parser combinators of Scala, but (correct me if I'm wrong), even if they are quite appealing, they consume a lot of time and memory, mainly due to backtracking.

So I have two main questions:

  • Why do people only seem to concentrate on _parser combinators?
  • What is your best lexer/parser generator suggestion to use with Scala?
nbro
  • 15,395
  • 32
  • 113
  • 196
Vinz
  • 5,997
  • 1
  • 31
  • 52

3 Answers3

9

As one of the authors of the ScalaBison paper, I have run into this issue a few times. :-) What I would usually do for scanning in Scala is use JFlex. It works surprisingly well with ScalaBison, and all of our benchmarking was done using that combination. The unfortunate downside is that it does generate Java sources, and so compilation takes a bit of gymnastics. I believe that John Boyland (the main author of the paper) has developed a Scala output mode for JFlex, but I don't think it has been publicly released.

For my own development, I've been working a lot with scannerless parsing techniques. Scala 2.8's packrat parser combinators are quite good, though still not generalized. I've built an experimental library which implements generalized parsing within the parser combinator framework. Its asymptotic bounds are much better than traditional parser combinators, but in practice the constant time overhead is higher (I'm still working on it).

Daniel Spiewak
  • 54,515
  • 14
  • 108
  • 120
  • Thanks for the answer and your gll combinators, I'll try to understand how it works :) But I think I'll try to play with JFlex and Scala together. – Vinz Jun 23 '10 at 09:47
  • 1
    Thanks to all lot of tutorial (including some of yours on codecommit) I finally managed to do a simple lexer/parser with parser combinators, and without too much recursion.. thanks again ! – Vinz Jun 23 '10 at 20:06
4

I know that this question is old, but for those still in search of a lexer generator that outputs Scala code, I've written a fork of JFlex that emits Scala rather than Java, including corresponding Maven and sbt plugins. All are now available on Maven Central.

We're currently using it (including the Maven/sbt plugins) to tokenize English text as part of the natural language processing pipline in FACTORIE -- example .flex file containing Scala here.

Emma Strubell
  • 675
  • 5
  • 18
4

Scala 2.8 has a packrat parser. I quote from the API docs here:

Packrat Parsing is a technique for implementing backtracking, recursive-descent parsers, with the advantage that it guarantees unlimited lookahead and a linear parse time. Using this technique, left recursive grammars can also be accepted.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681