How many ways are there to build a parser?

Question

I am learning about the ANTLR v4, which is a parser generator based on so-called Adaptive LL(*) algorithm. It claims to be a big improvement over LL(*) algorithm, but I also heard about some algorithm like LR.

What's the advantage/limitation of ANTLR's Adaptive LL(*) algorithm (over LR)?

Entire books have been written on this subject, I'm afraid this question is much too broad for SO. — Lucas Trzesniewski, Jan 02 '17 at 13:53
IMHO, the topic of creating various grammars and lexers/parsers is of more academic than practical interest. Most of the work done and time spent in a compiler is in the "optimizer". — Alexey Frunze, Jan 02 '17 at 14:14
The ANTLR reference doesn't go into the guts of the algorithm as far as I remember, but I'm pretty sure Terrence Parr must have published a paper on ALL(*). As for the classical LL, LR, LALR and so on look for the "dragon book" (a note of caution: it's very mathematical/academic in nature). — Lucas Trzesniewski, Jan 02 '17 at 14:14
@LucasTrzesniewski Thanks, I will try to take the challenge. — smwikipedia, Jan 02 '17 at 14:16
[Parsing Techniques: A Practical Guide](http://www.dickgrune.com/Books/PTAPG_2nd_Edition/). The first edition is available for free download. The literature reviee on the web page is invaluable. — rici, Jan 02 '17 at 14:37
The question is practically meaningless. Anybody can write their own parser generator or indeed invent their own (probably incorrect) parsing algorithm. There is no way to enumerate these and absolutely no point in doing so. — user207421, Jan 02 '17 at 14:40

score 8 · Accepted Answer · edited May 23 '17 at 12:09

How many contemporary algorithms are there to build a parser?

To start with one can look at the list of the common parser generators.
See: Comparison of parser generators and look under the heading Parsing algorithm.

ALL(*)  
Backtracking Bottom-up  
Backtracking LALR(1)  
Backtracking LALR(k)  
GLR  
LALR(1)  
LR(1)  
IELR(1)  
LALR(K)
LR(K)  
LL  
LL(1)
LL(*)  
LL(1), Backtracking, Shunting yard
LL(k) + syntactic and semantic predicates  
LL, Backtracking  
LR(0)  
SLR  
Recursive descent  
Recursive descent, Backtracking  
PEG parser interpreter, Packrat  
Packrat (modified)  
Packrat  
Packrat + Cut + Left Recursion  
Packrat (modified), mutating interpreter  
2-phase scannerless top-down backtracking + runtime support  
Packrat (modified to support left-recursion and resolve grammar ambiguity)  
Parsing Machine  
Earley  
Recursive descent + Pratt  
Packrat (modified, partial memoization)  
Hybrid recursive descent / operator precedence  
Scannerless GLR  
runtime-extensible GLR  
Scannerless, two phase  
Combinators  
Earley/combinators  
Earley/combinators, infinitary CFGs  
Scannerless GLR  
delta chain

Besides parser generators, there are also other algorithms/means to parse. In particular Prolog has DCG and most people who have written their first parser from scratch without formal training typically start with recursive descent. Also Chart parser and Left corner parser.

In writing parsers the first question that I always ask myself is how can I make a grammar for the language at the highest type in the Chomsky hierarchy. Here lowest is Type-0 and highest is Type-3.

Almost 90% of the time it is a Type-2 grammar (context-free grammars), then for the easer task it is a Type-3 grammar (regular grammars). I have experimented with Type-1 grammars (context-sensitive grammars) and even Type-0 grammars (unrestricted grammars).

And what's the advantage/limitation of ANTLR's Adaptive LL(*) algorithm?

See the paper written by Terrence Parr the creator of Adaptive LL(*): Adaptive LL(*) Parsing: The Power of Dynamic Analysis

In practical terms Adaptive LL(*) lets you get from a grammar to a working parser faster because you do not have to understand as much parsing theory because Adaptive LL(*) is, shall I say, nimble enough to side step the mines you unknowingly place in the grammar. The price for this is that some of the mines you unknowingly place in the grammar can lead to inefficiencies in the runtime of the parser.

For most practical programming language purposes Adaptive LL(*) is enough. IIRC Adaptive LL(*) can NOT do Type-0 grammars (unrestricted grammars) which Prolog DCG can, but as I said, most people and most common programming task only need either type 2 or type 3.

Also most parser generators are for type 2, but that does not mean they can't do type 1 or possibly type 0. I cannot be more specific as I do not have practical experience with all of them.

Anytime you use a parsing tool or library there is a learning curve to learning how to use it and what it can and can not do.

If you are new to lexing/parsing and really want to understand it more then take a course and/or read Compilers: Principles, Techniques, and Tools (2nd Edition)

Yes, but what you really want is a parser generator engine that covers the broadest range of lanuages with the least amount of fuss. As a practical matter, GLR does extremely well on this front (arbitrary context-free grammars) and is available in a number of tools. GLL is equally good but pretty hard to find. Earley is OK but not very efficient, at least its relatively easy to code. Everything else has trouble with real grammars; you are only choosing between which parsing pit you fall into, and how much work it takes to climb out the pit for your particular grammar. Including ANTLR. — Ira Baxter, Jan 03 '17 at 00:31
I like the way @IraBaxter says it, `you are only choosing between which parsing pit you fall into, and how much work it takes to climb out the pit for your particular grammar. Including ANTLR`. — Guy Coder, Jan 03 '17 at 17:53

How many ways are there to build a parser?

1 Answers1