The DSL I'm working on allows users to define a 'complete text substitution' variable. When parsing the code, we then need to look up the value of the variable and start parsing again from that code.
The substitution can be very simple (single constants) or entire statements or code blocks. This is a mock grammar which I hope illustrates my point.
grammar a;
entry
: (set_variable
| print_line)*
;
set_variable
: 'SET' ID '=' STRING_CONSTANT ';'
;
print_line
: 'PRINT' ID ';'
;
STRING_CONSTANT: '\'' ('\'\'' | ~('\''))* '\'' ;
ID: [a-z][a-zA-Z0-9_]* ;
VARIABLE: '&' ID;
BLANK: [ \t\n\r]+ -> channel(HIDDEN) ;
Then the following statements executed consecutively should be valid;
SET foo = 'Hello world!';
PRINT foo;
SET bar = 'foo;'
PRINT &bar // should be interpreted as 'PRINT foo;'
SET baz = 'PRINT foo; PRINT'; // one complete statement and one incomplete statement
&baz foo; // should be interpreted as 'PRINT foo; PRINT foo;'
Any time the & variable token is discovered, we immediately switch to interpreting the value of that variable instead. As above, this can mean that you set up the code in such a way that is is invalid, full of half-statements that are only completed when the value is just right. The variables can be redefined at any point in the text.
Strictly speaking the current language definition doesn't disallow nesting &vars inside each other, but the current parsing doesn't handle this and I would not be upset if it wasn't allowed.
Currently I'm building an interpreter using a visitor, but this one I'm stuck on.
How can I build a lexer/parser/interpreter which will allow me to do this? Thanks for any help!