I'm trying to create a grammar for a language that uses double quotes for strings and allows escaping of quotes with a backslash. I'm using ANTLR4 for parsing the input.
I've defined the following rule for matching strings:
STRING:
'"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
fragment
ESC_SEQ
: '\\'
( // The standard escaped character set such as tab, newline, etc.
[btnfr"'\\]
|
| // A Java style Unicode escape sequence
UNICODE_ESC
| // Invalid escape
.
| // Invalid escape at end of file
EOF
)
;
fragment
UNICODE_ESC
: 'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
;
However, this rule doesn't seem to correctly match strings that contain escaped quotes at the end of string. For example, the string "test \"string\" that works"
is parsed correctly but when my string is like "test string that does \"not work\""
this rule does not work. It also works for \n and other escaped chars.
(I am expecting to see "test string that "works""
as output)
I've tried modifying the rule to escape the backslash in the quote character, like this:
STRING:
'"' ( ESC_SEQ | ~('\\'|'"') )* '"' | ('\\' '"'))
fragment
ESC_SEQ
: '\\'
( // The standard escaped character set such as tab, newline, etc.
[btnfr"'\\]
|
| // A Java style Unicode escape sequence
UNICODE_ESC
| // Invalid escape
.
| // Invalid escape at end of file
EOF
)
;
fragment
UNICODE_ESC
: 'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
;
;
But this still doesn't work.
What am I doing wrong? How can I modify my grammar to correctly match strings with escaped quotes?