0

I have an ANTLR grammar file with the string definition as below

STRING
:  '"' (EscapeSequence | ~('\\'|'"') )* '"' ;
fragment EscapeSequence
  :   '\\' .
;

But this Lexer rule ignore the escape character at the first instance of the quotes. The

id\=\"

is recognized as the start of the string whereas there is a preceding escape character. this is happening only for the first quote. All the subsequent quotes, if escaped, are recognized properly.

/id\=\"Testing\" -- Should not be a string as both quotes are escaped
/id\="Testing" -- Should be a string between the quotes, since they are not escaped

The main problem to solve is to avoid the lexer from trying to recognize a string if the character (only the last one character) preceding a quote is an escape character. If there are multiple escape characters, I need to consider just one character before the starting quote.

Sudeep Hazra
  • 118
  • 15

1 Answers1

0

ANTLR will automatically provide the behavior you desire in almost every situation. Consider the following input:

/id\=\"Testing\"

The critical requirement involves the location and length of the token preceding the first quote character. In the following block I add spaces only for illustrating conditions that occur between characters.

/ i d \ = \ " T e s t i n g \ "
           ^
           |
           ----------- Make sure no token can *end* here

By ensuring that the first " character is included as part of the token which also includes the \ character before it, you ensure that the first " character will never be interpreted as the start of a STRING token.

If the above condition is not met, your " character will be treated as the start of a STRING token.

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280