I know that in general it is undecidable whether a context-free grammar is unambiguous. However, this does not imply that this cannot be decided for a subset of context-free grammars.
A grammar is used to transform an input text into a parse tree. A grammar is ambiguous if it can produce more than one parse tree for a given input.
The LR parser algorithm first transforms the grammar into a LR parser table. Afterwards, it uses a LR parser automaton to process a given input stream into a parse tree using the LR parser table. The first step is usually done by a parser generator while the second step is executed for each parsing operation.
Consider the LR parser table construction algorithm construct(G) = T | error
. The algorithm receives a context-free grammar G
. If the table construction is successful a conflict-free LR parser table T
is returned. If the table construction failed an error is returned. Examples for such an algorithm are SLR, LALR and CLR. Typical examples for the algorithm failing are shift-reduce and reduce-reduce conflicts.
For a finite input stream and a conflict-free LR parser table the LR parser automaton can deterministically derive one parse tree from a given input stream or return an error iff the input stream does not match the grammar. The parsing step can be formalized as parse(T, I) = O | error
, where T
is the conflict-free LR parsing table, I
is the input stream of tokens and O
is a single parse tree. The error is returned iff the input stream does not match the grammar.
Question
Summing up the statements above my understanding is that any grammar that can somehow be transformed into a conflict-free LR parser table is unambiguous. However, if the algorithm returns an error, this does not imply any statement about whether the grammar is ambiguous. Thus, a LR table construction algorithm semi-decides whether a subset of the context-free grammars is unambiguous. Is this correct?
Here are some steps where my conclusion might fall down:
- the LR table construction algorithm might not terminate
- the LR parser automaton might not terminate
- the LR table construction algorithm is incorrect and so the LR parser automaton will accept not the exact same set of input streams as the grammar from which it was constructed or it derives another parse tree than the grammar.
To my knowledge none of the above is true for common LR table construction algorithms.
I was not able to find a clear statement about this, so I would greatly appreciate an explanation as well as a reference where this question is discussed and clearly answered.
I think this question is quite relevant when designing a programming language, since if my conclusion is correct a LR parser makes sure that every program written in the language can be properly parsed. Are there other methods to ensure that a grammar is unambiguous?