0

I have defined two sets of identifiers IDENTIFIER_ONE and IDENTIFIER_TWO which are both exculsive subsets of IDENTIFIER. I would like to write a parser such that:

"i1(arg) EOS" can't be parsed (1)
"i2(arg) EOS" can be parsed (2)
"i1(arg) = value EOS" can be parsed (3)
"i2(arg) = value EOS" can be parsed (4)

where i1(resp., i2) belongs to IDENTIFIER_ONE (resp., IDENTIFIER_TWO); arg and value belong to IDENTIFIER. The following parser.mly has already realized all the points I am after, except (4):

identifier:
| IDENTIFIER_ONE { $1 }
| IDENTIFIER_TWO { $1 } 

block_statement_EOS:
| identifier LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EOS { BSE_I_I ($1, $3) }

Given i1(arg) = value EOS as input, as goal (3) it is correctly read as BSE_Let (i1, arg, value). However, given i2(arg) = value EOS as input, it stops the parsing after reading EQUAL. I guess it is because once the parse meets i2(arg), it goes to the 2nd rule of block_statement_EOS, and later EQUAL can't be parsed.

Ideally, I would hope the parser could try the 1st rule of block_statement_EOS if the 2nd rule fails. Could anyone help me to make this possible?

PS: If I write the parser.mly as follows, all the goals can be achieved. Does anyone know why? Additionally I really don't like this workaround, because I do need to write identifier instead of two subsets in many other rules, I would hope a more elegant solution...

block_statement_EOS:
| IDENTIFIER_ONE LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EQUAL identifier EOS { BSE_Let ($1, $3, $6) }
| IDENTIFIER_TWO LPAREN identifier RPAREN EOS { BSE_I_I ($1, $3) }
SoftTimur
  • 5,630
  • 38
  • 140
  • 292

1 Answers1

0

When your parser encounters an LPAREN after an IDENTIFIER_TWO, it has to decide whether to shift or to reduce:

  • shift: put LPAREN on the stack;
  • reduce: replace IDENTIFIER_TWO, which is on top of the stack, by identifier.

If your parser chooses to shift, it will never reduce this particular IDENTIFIER_TWO into identifier (because this particular IDENTIFIER_TWO will never be on top of the stack again), meaning that it will always reduce the second rule of block_statement_EOS.

If your parser chooses to reduce, it will never reduce the second rule of block_statement_EOS, as this rule starts with IDENTIFIER_TWO and not identifier.

This is why your second version works, because there is no need to choose between shifting and reducing after IDENTIFIER_TWO. The choice is made later, if you wish.

Bardou
  • 446
  • 2
  • 2
  • Thank you... Let's still take `i2(arg) = value EOS` as example, does it mean that the parser chooses `shift` when it encounters an `LPAREN`? It is a pity that the parser could not try another choice (`reduce`) if one choice (`shift`) leads to fail... Is there another way to work around, except my second version which I don't like? – SoftTimur Jan 29 '14 at 15:02
  • 1
    All symbols are shifted (i.e. put on the stack) at some point. Shifting basically means: reading the next symbol. What you want would imply backtracking, which is potentially very inefficient. However, it is possible that other parsing technologies would work, but I'm not familiar with them. Take a look at Menhir, maybe it can handle this case better. – Bardou Jan 29 '14 at 16:29