17

About once a year I have to develop or at least design a grammar and a parser - that appears a constant of my working life.

Every time I'm facing this task, thus about once year, I, quite a lex/yacc (flex/bison resp.) guy, consider, or reconsider, alternatives for plain lex/yacc, and, after some musing and trying I get back to plain lex/yacc.

Because I have a CORBA-server at the hub of the application I can call in from from a parser written in almost every language, so this time I had a look at

  • antlr4 (Java) and antlr3 (Java but has RT for other languages),
  • SableCC (Java),
  • Parse::EBNF, Parse::Yapp and Marpa (Perl),
  • and SimpleParse (Python),

For me, the tandem antlr4 with antlrworks looked the most promising candidate, but I'm not yet convinced that the time spent spent on getting into it will be amortized in the end.


The grammar I have to develop is similar to SQL DDL (in terms of structure, not in terms of the subject).

Why would any of the alternatives would make my task easier than using plain lex/yacc?

Solkar
  • 1,228
  • 12
  • 22
  • I think this is a question like "which programming language should I use", which is unlikely to attract the sort of factual objective answer SO promotes. So voted to close as non-constructive. However, the question for you is: what is it about lex/flex/yacc/bison that you find unsatisfactory? That would at least give you a clue about what features to seek. If it's just "I'd like to try something new," then flip a coin :) – rici May 13 '13 at 16:02
  • It's not comparable. If all generators would generate the same parser I would agree, but the outcome is completely different depending on the parser generator. – Mike Lischke May 14 '13 at 12:42

2 Answers2

13

What you also should consider is that the various parser generators generate quite different parsers. Yacc/bison produces bottom-up parsers which are often hard to understand, hard to debug and give weird error messages. ANTLR for instance produces a recursive descent top-down parser which is much easier to understand, you can actually debug it easily, you can only use subrules for a parse operation (e.g. just parse expressions instead of the full language).

Additionally, its error recovery is way better and produces a lot cleaner errors. There are various IDEs/plugins/extensions that make working with ANTLR grammars pretty easy (ANTLRWorks, the IntelliJ plugin, the Visual Studio Code extension etc.). And you can generate parsers in different languages (C, C++, C#, Java and more) from the same grammar (unless you have language specific actions in your grammar, you mentioned this in your question already). And while we speak of actions: due to the evaluation principle in bottom parser (shift token, shift token, reduce them to a new token and shift it etc.) actions can easily cause trouble there, e.g. executing more than once and such. Not so with parsers generated by ANTLR.

I also tried various parser generators over the years, even wrote my own, but I would anytime recommend ANTLR as the tool of choice.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
5

The latest Marpa is Marpa::R2, which has great improvements in "whipituptude", including a very convenient new DSL interface, which is itself written in Marpa. You might consider starting with Marpa, for "prototyping". Marpa is highly declarative, using clean BNF. If you migrate away, you can take most of your work to the new parser. Marpa is unsurpassed in its error handling and detection, also very handy in a prototyping phase.

Marpa parses all the classes of grammar parsed by the other parsers listed in linear time, and is unsurpassed in its flexibility. Its newest feature allows you to switch back and forth from Marpa to your own parsing code. So you might even stay with it. There is a website, and my blog has a series of tutorials, which may be the best way to get introduced to Marpa.

Jeffrey Kegler
  • 841
  • 1
  • 6
  • 8
  • Thx for your in-depth reply, Jeffrey! Pls. consider SQL DDL as an example. Why exactly would Marpa do better than lex & yacc with that? – Solkar May 13 '13 at 17:21
  • With Marpa you just type in BNF (amy BNF) and it runs. Being experienced with yacc, you know with LALR it ain't so easy. Marpa also knows and can report exactly where it is in the parse at all times, making error detection, debugging and maintainance far easier. – Jeffrey Kegler May 13 '13 at 18:41
  • Current SQL (or subset) implementations in Marpa are proprietary (alas) but a fragment from one is in the test suite. – Jeffrey Kegler May 13 '13 at 18:48
  • I'm not working on an SQL cmd shell, this was just for an example for a well-known grammar, which has some structural similarities with with my grammar-to-be, but I take this "Marpa also knows and can report exactly where it is in the parse at all times" feat you mentioned as a surplus value. Talking about "typing in BNF" - that means mixing keywords, delimiters, actions etc embedded in Perl structures. Are are there simple means for extracting the pure grammar in a documentation-friendly form? – Solkar May 13 '13 at 22:15
  • I'm not 100% sure I understood the question, but if you use Marpa's SLIF interface, the grammar will already be in a form close to BNF/EBNF, which is what is think is intended by "documentation ready". For an example of SLIF, see the Synopsis in https://metacpan.org/module/JKEGL/Marpa-R2-2.052000/pod/Scanless/DSL.pod. – Jeffrey Kegler May 14 '13 at 00:42
  • E.g. for documentations I would need the plain BNF grammar without e.g. actions included, let alone any implementation language constructs like Perl hashes holding them. For lex/yacc I have a bunch of scripts which do the job quite nicely, I would somehow expect that a package of Marpa's size has its own means for doing that. – Solkar May 14 '13 at 07:36