LR(k) parsers, with k infinite, not restricted to deterministic context-free languages?

Question

Is a theoretical LR parser with infinite lookahead capable of parsing (unambiguous) languages which can be desribed by a context-free grammar?

Normally LR(k) parsers are restricted to deterministic context-free languages. I think this means that there always has to be exactly one grammar rule that can be applied currently. Meaning within the current lookahead context not more than one possible way of parsing is allowed to occur. The book "Language Implementation Patterns" states that a "... parser is nondeterministic - it cannot determine which alternative to choose." if the lookahead sets overlap. In contrast a non-deterministic parser just chooses one way if there are multiple alternatives and then goes back to the decision point and chooses the next alternative if it is impossible at a certain point to continue with the decision previously made.

Wherever I read definitions of LR(k) parsers (like on Wikipedia or in the Dragon Book) I always read something like: "k is the number of lookahead tokens" or cases when "k > 1" but never if k can be infinite. Wouldn't an infinite lookahead be the same as trying all alternatives until one succeeds?

Could it be that k is assumed to be finite in order to (implicitly) distinguish LR(k) parsers from non-deterministic parsers?

score 1 · Answer 1 · answered Jun 14 '12 at 23:40

You are raising several issues here that are difficult to answer in a short form. Nevertheless I will try.

First of all, what is "infinite lookahead"? There is no book that describes such parser. If you have clear idea of what is this, you need to describe it first. Only after that we can discuss this topic. For now parsing theory discusses only LR(k) grammars, where the k is finite.

Normally LR(k) parsers are restricted to deterministic context-free languages. I think this means that there always has to be exactly one grammar rule that can be applied currently.

This is wrong. LR(k) grammars may have "grammar conflicts". Dragon book briefly mentions them without going into any details. "Grammars may have conflicts" means that some grammars do not have conflicts, while all other grammars have them. When grammar does not have conflicts, then there is ALWAYS only one rule and the situation is relatively simple.

When grammar has conflicts, this means that in certain cases more than one rule is applicable. Classic parsers cannot help here. What makes matters worse is that some input statement may have a set of correct parsings, not just one. From the grammar theory stand point all these parsings have the same value and importance.

The book "Language Implementation Patterns" states that a "... parser is nondeterministic - it cannot determine which alternative to choose."

I have impression that there is no definitive agreement on what "nondeterministic parser" means. I would tend to say that nondeterministic parser just picks up one of the alternatives randomly and goes on.

Practically only 2 strategies of resolving conflicts are used. The first one is conflict resolution in the callback handler. Callback handler is a regular code. Programmer, who writes it, checks whatever he wants in any way he wants. This code only gives back the result - what action to take. For the parser on top this callback handler is a black box. There is no theory here.

Second approach is called "backtracking". The idea behind is very simple. We do not know where to go. Ok, let's try all possible alternatives. In this case all variants are tried. There is nothing non deterministic here. There are several different flavors of backtracking.

If this is not enough I can write a little bit more.

score 0 · Answer 2 · answered Aug 12 '12 at 18:35

nondeterminism means that in order to produce the correct result(s!), a finite state machine reads a token and then has N>1 next states. You can recognize a nondeterministic FSM if a node has more than one outgoing edge with the same label. Note that not every branch has to be valid, but the FSM can't pick just one. In practice you could fork here, resulting in N state machines or you could try a branch completely and then come back and try the next one until every outgoing statetransfer was tested.

LR(k) parsers, with k infinite, not restricted to deterministic context-free languages?

2 Answers2