Are LL(1) parsers more efficient for a prefix encoding and LR(1) more efficient for postfix?

Question

This is a hand coded pedal to the metal question and not ANTLR vs BISON.
Also, this is for parsing a binary format. There is no lexical analysis.

Welcome to Stack Overflow! I edited your question as far as I could guess your problem. However, add more description so that more people with knowledge of the subject will see it. P Good Luck! — Enamul Hassan, Jul 23 '16 at 04:24
Even if you have a binary format, you still have to identify tokens (eg., extract a four-byte integer -- in little-endian or big-endian byte order, depending on your protocol -- from the input stream) which is the moral equivalent of lexical analysis. And in most cases, that will still take more cycles than the relatively trivial cost of associating tokens into syntactic forms ("parsing"). — rici, Jul 24 '16 at 22:10

rici · Answer 1 · 2016-07-23T08:23:18.710

The cost of parsing a strict pre- (or post-) order expression is trivial, using either top-down or bottom-up techniques. It will be dwarfed by any of the other tasks, even lexical analysis. The tiny speed differences will be the result of implementation details rather than algorithmic strategy.

There's no point in using an LR(1) parser, since you don't need token lookahead for either pre-order or post-order representations, assuming the representation is purely pre-/post-order. LR(0) would be just fine. You're unlikely to find a useful LR(0) parser generator, but if you want to hand-write a parser that fact will simplify your task.

score 0 · Answer 2 · answered Aug 08 '16 at 21:49

Ignoring LL(1) and LR(1) for the moment now, you'd typically parse these sorts of expressions by rolling your own parsing code. You'd maintain a stack of previously parsed and evaluated subexpressions, then repeatedly either pop the top two items off the stack and merge them (if you read another operator) or push something onto the stack (if you read a number).

There are a few ways you could actually implement that stack. You could have the stack be an explicit stack data structure, where you scan across the input from left to right and push and pop things as appropriate. This is closest in style to how an LR(1) parser works, since you'd be thinking in terms of shifting (pushing) and reducing (popping). You could alternatively use a recursive algorithm and have the call stack take the place of the explicit stack, which is closer in spirit to how LL(1) parsing works.

Both LL(1) and LR(1) parsing in this case, if you just care about raw performance, seem like total overkill. They're designed to handle large classes of general grammars and the overhead from the tables likely would eat into your performance. I'd just write the code in the two different ways (explicit stack/bottom-up vs implicit stack/top-down) and see which of the two actually ends up being faster.

Thanks. This is pedal to the metal where parse latency is critical. My intuition says that LL is better for prefix and LR is better for postfix. But as you suggest, I'll write both and test. Actually, I'll probably test all four cases, LL/pre, LL/post, LR/pre, LR/post and test. — Olsonist, Aug 08 '16 at 22:32
Actually, call stacks have explicit HW support on modern x86s. That gives an advantage to recursive-decent over table driven that I hadn't thought of. — Olsonist, Aug 08 '16 at 23:06

score 0 · Answer 3 · answered Sep 22 '16 at 04:21

0

This article, LL and LR Parsing Demystified, backs up my intuition:

Polish and Reverse Polish notation directly correspond, in my view, to LL and LR parsing, respectively.

answered Sep 22 '16 at 04:21

Olsonist

2,051
1
20
35

Are LL(1) parsers more efficient for a prefix encoding and LR(1) more efficient for postfix?

3 Answers3