While writing parser code in Menhir, I keep coming across this design pattern that is becoming very frustrating. I'm trying to build a parser that accepts either "a*ba" or "bb". To do this, I'm using the following syntax (note that A*
is the same as list(A)
):
exp:
| A*; B; A; {1}
| B; B; {2}
However, this code fails to parse the string "ba". The menhir compiler also indicates that there are shift-reduce conflicts in the parser, specifically as follows:
** In state 0, looking ahead at B, shifting is permitted
** because of the following sub-derivation:
. B B
** In state 0, looking ahead at B, reducing production
** list(A) ->
** is permitted because of the following sub-derivation:
list(A) B A // lookahead token appears
So | B A
requires a shift, while | A* B A
requires a reduce when the first token is B
. I can resolve this ambiguity manually and get the expected behavior by changing the expression to read as follows (note that A+
is the same as nonempty_list(A)
):
exp2:
| B; A; {1}
| A+; B; A; {1}
| B; B; {2}
In my mind, exp
and exp2
read the same, but are clearly treated differently. Is there a way to write exp
to do what I want without code duplication (which can cause other problems)? Is this a design pattern I should be avoiding entirely?