I am trying to write a grammar that will recognize <<word>>
as a special token but treat <word>
as just a regular literal.
Here is my grammar:
grammar test;
doc: item+ ;
item: func | atom ;
func: '<<' WORD '>>' ;
atom: PUNCT+ #punctAtom
| NEWLINE+ #newlineAtom
| WORD #wordAtom
;
WS : [ \t] -> skip ;
NEWLINE : [\n\r]+ ;
PUNCT : [.,?!]+ ;
WORD : CHAR+ ;
fragment CHAR : (LETTER | DIGIT | SYMB | PUNCT) ;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9] ;
fragment SYMB : ~[a-zA-Z0-9.,?! |{}\n\r\t] ;
So something like <<word>>
will be matched by two rules, both func
and atom
. I want it to be recognized as a func
, so I put the func
rule first.
When I test my grammar with <word>
it treats it as an atom
, as expected. However when I test my grammar and give it <<word>>
it treats it as an atom
as well.
Is there something I'm missing?
PS - I have separated atom
into PUNCT
, NEWLINE
, and WORD
and given them labels #punctAtom
, #newlineAtom
, and #wordAtom
because I want to treat each of those differently when I traverse the parse tree. Also, a WORD
can contain PUNCT
because, for instance, someone can write "Hello," and I want to treat that as a single word (for simplicity later on).
PPS - One thing I've tried is I've included <
and >
in the last rule, which is a list of symbols that I'm "disallowing" to exist inside a WORD
. This solves one problem, in that <<word>>
is now recognized as a func
, but it creates a new problem because <word>
is no longer accepted as an atom
.