0

I'm using ANTLR4 to parse text adventure game dialogue files written in Yarn, so mostly free form text and loads of island grammars, and for the most part things are going smoothly but I am having an issue excluding certain text inside the Shortcut mode (when presenting options for the player to choose from).

Basically I need to write a rule to match anything except #, newline or <<. When it hits a << it needs to move into a new mode for handling expressions of various kinds or to just leave the current mode so that the << will get picked up by the already existing rules.

A cut down version of my lexer (ignoring rules for expressions):

lexer grammar YarnLexer;

NEWLINE : ('\n') -> skip;

CMD : '<<' -> pushMode(Command);
SHORTCUT : '->' -> pushMode(Shortcut);

HASHTAG : '#' ;

LINE_GOBBLE : . -> more, pushMode(Line);

mode Line;
LINE : ~('\n'|'#')* -> popMode;

mode Shortcut ;
TEXT : CHAR+ -> popMode;
fragment CHAR : ~('#'|'\n'|'<');

mode Command ;
CMD_EXIT : '>>' -> popMode;

// RULES FOR OPERATORS/IDs/NUMBERS/KEYWORDS/etc
CMD_TEXT : ~('>')+ ;

And the parser grammar (again ignoring all the rules for expressions):

parser grammar YarnParser;

options { tokenVocab=YarnLexer; }

dialogue: statement+ EOF;

statement : line_statement | shortcut_statement | command_statement ;

hashtag : HASHTAG LINE ;

line_statement : LINE hashtag? ;

shortcut_statement : SHORTCUT TEXT command_statement? hashtag?;

command_statement : CMD expression CMD_EXIT;
expression : CMD_TEXT ;

I have tested the Command mode when it is by itself and everything inside there is working fine, but when I try to parse my example input:

Where should we go?
-> the park
-> the zoo
-> Peter's house <<if $metPeter == true >>

ok shall we take the bus?
-> :<
-> ok

<<set $daySpent = true>>

my issue is the line:

-> Peter's house <<if $metPeter == true >>

gets matched completely as TEXT and the CMD rules just gets ignored in favour by far longer TEXT.

My first thought was to add < to the set but then I can't have text like:

-> :<

which should be perfectly valid. Any idea how to do this?

McJones
  • 1
  • 2
  • Have you tried to include `<<` (and `>>`) into the list of "forbidden" tokens of the rule `CHAR`? Something like `fragment CHAR: ~('\n' | '#' | '<<' | '>>');`? – Raven Jul 17 '17 at 09:00
  • yeah that was the first thing I tried but turns out you can't have multi-char literals in sets in ANTLR. – McJones Jul 17 '17 at 09:38

1 Answers1

0

Adding a single left angle bracket to the exclusion list creates a single corner case that is easily handled:

TEXT : CHAR+ ;
CMD  : '<<' -> pushMode(Command);
LAB  : '<'  -> type(TEXT) ;

fragment CHAR : ~('\n' | '#' | '<') ;
GRosenberg
  • 5,843
  • 2
  • 19
  • 23
  • Ah this works, excellent! But there is an issue when I then try and embed all this into its own lexer mode. I was planning on using the TEXT to pop the mode, but that doesn't work when a line ends with a `<` – McJones Jul 18 '17 at 02:32
  • Why not use the guard pattern `'>>'` to terminate the mode? If this is not the entire answer, you will need to show your grammar and valid input use cases to get meaningful help. – GRosenberg Jul 18 '17 at 02:47
  • very good point, updated my original question to show my grammar – McJones Jul 18 '17 at 03:31
  • The corner case (`LAB`) rule is missing. BTW, how do you know the line is being captured as `TEXT` and not `LINE`? Need to [dump the token stream](https://stackoverflow.com/questions/29197727/antlr-4-5-parser-error-during-runtime/29198883#29198883) to check. – GRosenberg Jul 18 '17 at 18:49
  • Even with the `LAB` rule the problem still persists, detecting a `<` and emitting it as a `TEXT` still has the issue of not popping out of `Shortcut` mode. – McJones Jul 19 '17 at 00:37