7

Which lexer/parser generator is the best (easiest to use, fastest) for C or C++? I'm using flex and bison right now, but bison only handles LALR(1) grammars. The language I'm parsing doesn't really need unlimited lookahead, but unlimited lookahead would make parsing a lot easier. Should I try Antlr? Coco/R? Elkhound? Something else?

Lesmana
  • 25,663
  • 9
  • 82
  • 87
Zifre
  • 26,504
  • 11
  • 85
  • 105
  • 1
    What do you mean by "best"? You need to make your question more specific. –  Mar 30 '09 at 14:28
  • 1
    What are your requirements? Is LALR(1) not sufficient for you, and if so, in what ways? – Brian Campbell Mar 30 '09 at 14:40
  • I second Brian's question. What do you need to be able to do? How is LALR(1) insufficient? – Scottie T Mar 31 '09 at 03:44
  • By definition, LALR(1) only handles single token lookahead. If you ever look at any LALR(1) grammar for a language like C++, you'll see all kinds of ugly hacks to make it work. – Zifre Mar 31 '09 at 13:04

7 Answers7

5

Updated 2015-01-05:

My original answer pointing to a now deleted question:

There are a bunch of good answers to this question already in What parser generator do you recommend

So I've taken the list of items from the deleted answer on archive.org with at least 1 vote here:

I've done several flex/bison systems myself but now I'd replace both with Lemon from sqlite since it's one tool, re-entrant and thread safe as well as having a streaming/pull-based model.

Community
  • 1
  • 1
dajobe
  • 4,938
  • 35
  • 41
  • Lemon looks really nice, and I've been able to reduce the grammar to LALR(1), so I might user it. – Zifre Sep 27 '09 at 14:48
  • 1
    The programmers at stackoverflow need a check in their system before they delete questions to make sure it's not referenced by anyone. Do you remember what some of the suggestions were? – Natalie Adams Dec 25 '14 at 18:19
  • I agree, this answer is now completely useless – paulm Jan 05 '15 at 14:06
3

The bad news is that most real computer langauges aren't "LALR(1)", which means you have to resort to considerable hackery to make YACC parse real langauges.

If you are designing a DSL, you can use any the LALR parser generators without a lot of trouble precisely because you can change the grammar of your DSL when the parser generator squawks. LL parser generators mostly work here too for the same reason but the lack of left recursion can be a real pain.

If you are uncomprising in the way you like your syntax, GLR parsers are hands-down winners. We use them in the DMS Software Reengineering Toolkit and have built production quality parsers for some 30+ languages including C++, which has a folk theorem saying its nearly impossible to parse. The folk theorem was started by people using LL and LALR parsers to try and handle C++. GLR does it easily.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
1

ANTLR makes unlimited lookahead very easy using 'backtrack' option. It might also qualify your 'easiest to use, fastest' criteria since it has ANTLRWORKS that lets you visualize and debug your grammar.

Another advantage is that it makes AST building trivially easy with its built-in support for building ASTs which is missing in bison.

With two books published - 'ANTLR: Definitive guide' and 'Language design patterns', it is one among the very well documented tools available. You also have a very active mailing list.

Indhu Bharathi
  • 1,437
  • 1
  • 13
  • 22
0

The latest bison claims to do unlimited lookahead, by (in effect) doing several parses simultaneously. If you already have investment in bison then it may be worth trying this out, rather than switching to another package.

http://www.gnu.org/software/bison/manual/bison.html#GLR-Parsers

I have not used this feature myself, though.

As far as other systems go, I have used ANTLR. I did not particularly like it (the documentation was not very good, and one must manually factor one's grammar to cater for operator precedence), but it did work, and so many swear by it that it is certainly worth looking at.

  • I did try Bison GLR parsing, but it seems to cause some problems with operator precedence and is noticeably slower. ANTLR is hard to use with C++ and I greatly prefer LR over LL style grammars. – Zifre Sep 28 '09 at 22:06
0

LRSTAR 9.1 can generate LR(1) and LR(*) parsers. It is a C++ based system, friendly to Windows and Visual Studio. It creates table-driven parsers and table-driven lexers, which are small and quick to compile. LRSTAR parsers can build an AST automatically.

0

I don't know what you are looking for exactly, but I think that Boost Xpressive is worth looking at ...

not exactly a parser generator but a great tool to handle grammars and I feel it can handle weird ones.

siukurnin
  • 2,862
  • 17
  • 20
0

I have been using the GOLD parsing system (http://www.devincook.com/goldparser) with very good results. My project is small, a parsing system for NC files in C. But I think the tool can handle more complex projects as well.

Mauricio
  • 104
  • 3
  • Goldparser is nice but it is extremely slow. Even in speed optimized C++ code it takes 10 seconds to parse a 15000 lines of code. If you compare this with the speed of the PHP parser this is extremely slow. – Elmue Sep 26 '13 at 03:00