I have a very peculiar requirement to parse inputs using ANTLR. I would like to be able to parse expressions like;
Correct Inputs
- user name
- user_name user-name
- | EATALL any thing could come here/ok | EATALL ...
Invalid Inputs
- user/name
- user&name^face
Well, any expressions which come after | EATALL
& before | EATALL
(if any) must be obtained as a single token. While in case of other simple inputs where | EATALL
doesn't appear, only valid combination of _
, -
, [a-zA-Z0-9]
is tokenized as a one token. In pseudocode,
- user name -> [user] [name]
- user_name -> [user_name]
- |EATALL user/name my user -> [user/name my user]
This already seems like an ambiguous case of tokenization for me. I am seeking your suggestions on dealing problems like these in antlr. Thanking you in advanced.