EDIT: I've been asked if I can provide the full grammar. I cannot and here is the reason why:
I cannot provide my full grammar code because it is homework and I am not allowed to disclose my solution, and I will sadly understand if my question cannot be answered because of this. I am just hoping this is a simple thing that I am just failing to understand from the documentation and that this will be enough for someone who knows antlr4 to know the answer.
This was posted in the original answer but to prevent frustration from possible helpers I now promote it to the top of the post. Disclaimer: this is homework related.
I am trying to tokenize a piece of text for homework, and almost everything works as expected, except the following:
TIME : '<time>';
This rule used to be in my grammar. When tokenizing the piece of text, I would not see the TIME token, instead I would see a '<time>'
token (which I guess Antlr created for me somehow). But when I moved the string itself to a fragment rule and made the TIME rule point to it, like so:
fragment TIME_TAG : '<time>';
.
.
.
TIME : TIME_TAG;
Then I see the TIME token as expected. I've been searching the internet for several hours and couldn't find an answer.
Another thing that happens is the ATHLETE rule which is defined as:
ATHLETE : WHITESPACE* '<athlete>' WHITESPACE*;
Is also recognized properly and I see the token ATHLETE, but it wasn't recognized when
I didn't allow the WHITESPACE*
before and after the tag string.
I cannot provide my full grammar code because it is homework and I am not allowed to disclose my solution, and I will sadly understand if my question cannot be answered because of this. I am just hoping this is a simple thing that I am just failing to understand from the documentation and that this will be enough for someone who knows antlr4 to know the answer.
Here is my piece of text:
World Record World Record
[1] <time> 9.86 <athlete> "Carl Lewis" <country> "United
States" <date> 25 August 1991
[2] <time> 9.69 <athlete> "Tyson Gay" <country> "United
States" <date> 20 September 2009
[3] <time> 9.82 <athlete> "Donovan Baily" <country>
"Canada" <date> 27 July 1996
[4] <time> 9.58
<athlete> "Usain Bolt"
<country> "Jamaica" <date> 16 August 2009
[5] <time> 9.79 <athlete> "Maurice Greene" <country>
"United State" <date> 16 June 1999
My task is simply to tokenize it. I am not being given the definitions of tokens, and I am supposed to decide that myself. I think '<sometag>'
is pretty obvious, so are '"' wrapped strings, numbers, dates, and square-bracket surrounded enumerations.
Thanks in advance to any help or any useful knowledge.