I'm using ANTLR 4 to parse a protocol's messages, let's name it 'X'. Before extracting a message's information , I have to check if it complies with X's rules.
Suppose we have to parse X's 'FOO' message that follows the following rules:
- Message starts with the 'messageIdentifier' that consists of the 3-letter reserved word FOO.
- Message contains 5 fields, of which the first 2 are mandatory (must be included) and the rest 3 are optional (can be not included).
- Message's fields are separated by the character '/'. If there is no information in a field (that means that the field is optional and is omitted) the '/' character must be preserved. Optional fields and their associated filed separators '/' at the end of the message may be omitted where no further information within the message is reported.
- A message can expand in multiple lines. Each line must have at least one non-empty field (mandatory or optional). Moreover, each line must start with a '/' character and end with a non-empty field following a '\n' character. Exception is the first line that always starts with the reserved word FOO.
- Each message's field also has its own rules regarding the accepted tokens, which will be shown in the grammar below.
Sample examples of valid FOO messages:
FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/MANDATORY2\n
/OPT 1\n
/HELLO\n
/100\n
FOO/MANDATORY_1/MANDATORY2\n
FOO/MANDATORY_1/MANDATORY2//HELLO/100\n
FOO/MANDATORY_1/MANDATORY2///100\n
FOO/MANDATORY_1/MANDATORY2/OPT 1\n
FOO/MANDATORY_1/MANDATORY2 ///100\n
Sample examples of non-valid FOO messages:
FOO\n
/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/\n
MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/MANDATORY2/OPT 1//\n
FOO/MANDATORY_1/MANDATORY2/OPT 1/\n
/100\n
FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100\n
FOO/MANDATORY_1/MANDATORY2/\n
FOO/MANDATORY_1/MANDATORY2/OPT 1/HELLO/100
Below follows the grammar for the above message:
grammar Foo_Message
/* Parser Rules */
startRule : 'FOO' mandatoryField_1 ;
mandatoryField_1 : '/' field_1 NL? mandatoryField_2 ;
mandatoryField_2 : '/' field_2 NL? optionalField_3 ;
optionalField_3 : '/' field_3 NL? optionalField_4
| '/' optionalField_4
| optionalField_4
;
optionalField_4 : '/' field_4 NL? optionalField_5
| '/' optionalField_5
| optionalField_5
;
optionalField_5 : '/' field_5 NL?
| NL
;
field_1 : (A | N | B | S)+ ;
field_2 : (A | N)+ ;
field_3 : (A | N | B)+ ;
field_4 : A+ ;
field_5 : N+ ;
/* Lexer Rules */
A : [A-Z]+ ;
N : [0-9]+ ;
B : ' ' -> skip ;
S : [*&@#-_<>?!]+ ;
NL : '\r'? '\n' ;
The above grammar parses correctly any input that complies with FOO message's rules. The problem resides in parsing a line that ends with the '/' character, which according to the protocol's FOO message's rules is an invalid input. I understand that the second alternatives of rules 'optionalField_3', 'optionalField_4' and 'optionalField_5' lead to this behavior but I can't figure out how to make a rule for this. Somehow I need the parser to remember that he came to 'optionalField_5' rule after seeing a non-omitted field in the previous rule, which if I am not mistaken can't be done in ANTLR as I can't check from which alternative of the previous rule I reached the current rule.
Is there a way to make the parser 'remember' this by some explicit option-rule? Or does my grammar need to be rearranged and if yes how?