So I found the solution.
There's a feature called partial parsing in Happy, described in the documentation, though I discovered it reading git log
of the source repository. It allows the parser to discard the remaining input. It is declared using a directive different than %name
:
%name parser {- normal parser -}
%partial parser {- partial parser -}
But the way it works doesn't fit my second requirement: it should not force the lazy tokenizer to consume input any further. Instead it needs exactly one more token to verify that there's nothing more to parse.
Assume that !
is not a valid symbol and the tokenizer fails to consume it, and consider the following inputs:
begin end. valid_token!!!
begin end.!
Parsing (1) succeeds, because Happy checks the valid_token
and stops there, but parsing (2) fails, since one more token is needed (and the tokenizer is unable to give it).
Apparently there's no way to change this behavior, so my workaround is to represent a lexical error by a special token that appears nowhere in the grammar. Thus when tokenizer encounters !
(or any other invalid character) it yields a special error token. It also should help to recover from lexical errors.