The problem is that you're ignoring the shift/reduce conflict you get from your parser generator. While yacc/bison (and presumably PLY) will resolve errors for you, that resolution might not be doing what you want, and might result in a parser that parses a langauge other than the one you are trying to parse.
Whenever you get a shift/reduce (or reduce/reduce) conflict from an LR parser generator, you really need to understand what the conflict is (and why it occurs) to know whether you can ignore it or whether you need to fix it. So lets fix your grammar by getting rid of the 'hack' (which is clearly wrong and not something you want to parse), as well as the useless 'empty' rule (which just confuses things):
%token FILE NUMBER
%%
algebraic_notation : piece start_position capture end_position promotion
piece : 'K' | 'Q' | 'B' | 'N' | 'R' | /*pawn*/
start_position : FILE | NUMBER | FILE NUMBER | /*empty*/
end_position : FILE NUMBER
capture : 'x' | /*empty*/
promotion : '=' 'Q' | '=' 'R' | '=' 'N' | '=' 'B' | /*empty*/
Now when you run this through 'bison -v' (ALWAYS use -v to get the verbose output file -- I'm not sure what PLY's equivalent is), you get the message about a shift/reduce conflict, and if you look in the .output
file you can see what it is:
state 7
1 algebraic_notation: piece . start_position capture end_position promotion
FILE shift, and go to state 9
NUMBER shift, and go to state 10
FILE [reduce using rule 11 (start_position)]
$default reduce using rule 11 (start_position)
start_position go to state 11
This is telling you that after seeing a piece
, when the next token is FILE
, it doesn't know whether it should shift (treating the FILE
as (part of) the start_position
) or reduce (giving an empty start_position
). That's because it needs more lookahead to see if there's a second position to use as an end_position
to know what to do, so simply ignoring the conflict will result in a parser that fails to parse lots of valid things (basically, anything with an empty start_position
and capture
).
The best way to solve a lookahead-related shift-reduce conflict involving an empty production like this (or pretty much any conflict involving an empty production, really) is to unfactor the grammar -- get rid of the empty rule and duplicate any rule that uses the non-terminal both with and without it. In your case, this gives you the rules:
algebraic_notation : piece capture end_position promotion
algebraic_notation : piece start_position capture end_position promotion
start_position : FILE | NUMBER | FILE NUMBER
(the other rules are unchanged)
With that you still have a shift-reduce conflict:
state 7
1 algebraic_notation: piece . capture end_position promotion
2 | piece . start_position capture end_position promotion
FILE shift, and go to state 9
NUMBER shift, and go to state 10
'x' shift, and go to state 11
FILE [reduce using rule 14 (capture)]
start_position go to state 12
capture go to state 13
Basically, we've just moved the conflict one step and now have the problem with the empty capture
rule. So we unfactor that as well:
algebraic_notation : piece end_position promotion
algebraic_notation : piece capture end_position promotion
algebraic_notation : piece start_position end_position promotion
algebraic_notation : piece start_position capture end_position promotion
capture : 'x'
and now bison reports no more conflicts, so we can be reasonably confident it will parse the way we want. You can simplify it a bit more by getting rid of the capture
rule and using a literal 'x'
in the algebraic_notation
rule. I personally prefer this, as I think it is clearer to avoid the unnecessary indirection:
%token FILE NUMBER
%%
algebraic_notation : piece end_position promotion
algebraic_notation : piece 'x' end_position promotion
algebraic_notation : piece start_position end_position promotion
algebraic_notation : piece start_position 'x' end_position promotion
piece : 'K' | 'Q' | 'B' | 'N' | 'R' | /*pawn*/
start_position : FILE | NUMBER | FILE NUMBER
end_position : FILE NUMBER
promotion : '=' 'Q' | '=' 'R' | '=' 'N' | '=' 'B' | /*empty*/