0

Regarding the following reduced grammar

proof_command : 'Proof' 'using' collection '.';
collection : 'Collection' IDENT ':=' section_subset_expr
           | 'Collection' KeySOME ':=' IDENT IDENT IDENT
           ;

KeySOME : 'Some';

(wherease IDENT is just a usual identifier as in Java) i am trying to parse the following : Proof using Collection Some := a b c . This doesn't work and results in the following Error message:

mismatched input 'a' expecting 'section_subset_expr'

This is because IDENT can of course also be 'Some' .

Is there a way to use Some as a keyword and as an Identifier, so the expression above gets parsed correctly? Maybe through a semantic predicate excluding 'Some' from IDENT in the collection rule? But how would that look like?

IDENT : IDENT2;
fragment IDENT2 : FIRST_LETTER (SUBSEQUENT_LETTER)*;
fragment FIRST_LETTER :  [a-z] | [A-Z] | '_' | UNICODE_LETTER;
fragment SUBSEQUENT_LETTER : [a-z] | [A-Z] | DIGIT | '_' | '"' | '\''| UNICODE_LETTER | UNICODE_ID_PART;
fragment UNICODE_LETTER : '\\' 'u' HEX HEX HEX HEX;
fragment UNICODE_ID_PART : '\\' 'u' HEX HEX HEX HEX;
fragment HEX : [0-9a-fA-F];

KeySOME : 'Some'; 
Tilman Zuckmantel
  • 643
  • 1
  • 6
  • 18

1 Answers1

1

The way the lexer works is that when multiple rules can be matched on the given input, it decides which one to use by the following criteria:

  1. If one rule leads to a longer match than all others, that one is taken (this is known as the maximal much rule)
  2. If multiple rules lead to an equally long match, the one that appears first in the grammar is taken. Literals that appear directly in a parser rule (such as 'Proof', 'using' and 'Collection' in your grammar) are counted as appearing before any named lexer rules.

So since your KeySOME rule appears behind IDENT, it will never be taken because any input that matches KeySOME also matches IDENT and IDENT comes first.

So you can either move KeySOME to appear before IDENT or you can remove the rule altogether and just use 'Some' directly in its place (i.e. 'Collection' 'Some' ':=' IDENT IDENT IDENT).

sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • this brought me a step further. But now i can't use Some as an identifier anymore. This here won't parse: "Collection Some := Some Some Some." Is there a way to deal with this ? Just to make things clear, the language allows Some as a Keyword and also as an Identifier. – Tilman Zuckmantel Jun 22 '18 at 12:29
  • 2
    @TilmanZuckmantel Each token has exactly one token type based on the lexer rules. Either `Some` is always a keyword or always an `IDENT`. If you want to allow `Some` in place of an `IDENT` in some places, you'll have to explicitly add it as an alternative in those cases. Perhaps through a rule `idOrSome : IDENT | 'Some';`. – sepp2k Jun 22 '18 at 12:42