5

I'm looking for lexical analysis and parser-generating utilities that are not Flex or Bison. Requirements:

  • Parser is specified using a context-free LL(*) or GLR grammar. I would also consider PEGs.
  • Integrates tightly with a programming language that could be used for both scripting and application development. Language should also have facilities for easily interfacing with C. Good examples are Python, Ruby, and Guile. No C, Java, or Perl please. I want the language to be homogeneous; I want the parser generator to output code in the same language.
  • Well-documented and production-quality.
  • Open source. Free is also desirable (although not required).
  • Compatible with Linux distributions or one of the open source BSDs. I would consider OpenSolaris.
  • Rapid development is a considerably greater concern than efficiency.
  • Suited to parsing natural language as well as formal languages. Natural language parsing is limited to short, simple sentences with very little ambiguity.

I have my eye on ANTLR, although I have never used it. Comments to that effect are appreciated. Let me know what your favorite utilities are that meet these requirements, and why you would recommend them.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • How about http://pyparsing.wikispaces.com/Introduction or http://www.dabeaz.com/ply/ – GWW Feb 13 '11 at 04:42
  • 2
    Given your constraints, what's the objection to flex/bison? – Ira Baxter Feb 14 '11 at 20:45
  • Every LALR grammar is an LR grammar by definition. Further more, insisting on LR-not-LALR parser generators mostly gets you huge tables without a lot of additional practical benefit. So I'm not sure why you insist(?) on non-LALR parser generators. If your focus is only on generating code for Python, Ruby, Guile, then I understand better. FWIW, I don't think ANTLR generates any of these. But I'm still puzzled: if your language of choice "easily interfaces with C" (e.g. Python), bison is still a fine choice: just use your language to call Bison's generated C code. – Ira Baxter Feb 15 '11 at 16:24
  • My experience is with Bison and LALR grammars, which are not sufficient to fulfill my needs this time; I'm looking for a tool with a more expressive grammar. I believed a canonical LR or LL(*) grammar would qualify. However, it now seems a GLR grammar might be a better choice. As for the languages, these parsers are to be generated as part of an extension framework for a base program, which is written in C. I want the extension language to be homogeneous, and I do not want it to be C. ANTLR is capable of generating Python. – Jerrad Genson Feb 16 '11 at 07:03

3 Answers3

2

There is a list of modern Packrat parsers here.

SK-logic
  • 9,605
  • 1
  • 23
  • 35
  • I believe the OP has required the use of a context-free grammar, which PEGs don't appear to fall into. Still, maybe there's a chance that the OP will find PEGs acceptable. – C. K. Young Feb 13 '11 at 17:02
  • 1
    Any context-free LL(*) or LR can be represented in PEG, so in practice this difference does not matter. Automaton-based parser generators do not fit well into OPs second requirement, whereas PEGs can be easily integrated into almost any language. – SK-logic Feb 13 '11 at 17:13
1

NL text tends have to lots of ambiguity. If you want to parse natural langauge, I don't think any of the classic compiler-type parser generators (LALR, LL [including ANTLR]) will help you much, and compiler type parser generators typically don't handle this at all.

A GLR parser, which does handle ambiguity, may be of some use; bison offers this as an option.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

Guile 2.0 (to be released in about a few days) has an LALR(1) parsing library.

C. K. Young
  • 219,335
  • 46
  • 382
  • 435