1

I'm working on a simple procedural interpreted scripting language, written in Java using ANTLR4. Just a hobby project. I have written a few DSLs using ANTLR4 and the lexer and parser presented no real problems. I got quite a bit of the language working by interpreting directly from the parse tree but that strategy, apart from being slow, started to break down when I started to add functions.

So I've created a stack-based virtual machine, based on Chapter 10 of "Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages". I have an assembler for the VM that works well and I'm now trying to make the scripting language generate assembly via an AST.

Something I can't quite see is how to detect when an expression or function result is unused, so that I can generate a POP instruction to discard the value from the top of the operand stack.

I want things like assignment statements to be expressions, so that I can do things like:

x = y = 1;

In the AST, the assignment node is annotated with the symbol (the lvalue) and the rvalue comes from visiting the children of the assignment node. At the end of the visit of the assignment node, the rvalue is stored into the lvalue, and this is reloaded back into the operand stack so that it can be used as an expression result.

This generates ( for x = y = 1):

CLOAD 1    ; Push constant value
GSTOR y    ; Store into global y and pop
GLOAD y    ; Push value of y
GSTOR x    ; Store into global x and pop
GLOAD x    ; Push value of x 

But it needs a POP instruction at the end to discard the result, otherwise the operand stack starts to grow with these unused results. I can't see the best way of doing this.

I guess my grammar could be flawed, which is preventing me seeing a solution here.

grammar g;

// ----------------------------------------------------------------------------
// Parser
// ----------------------------------------------------------------------------

parse
    : (functionDefinition | compoundStatement)*
    ;

functionDefinition
    : FUNCTION ID parameterSpecification compoundStatement
    ;

parameterSpecification
    : '(' (ID (',' ID)*)? ')'
    ;

compoundStatement
    : '{' compoundStatement* '}'
    | conditionalStatement
    | iterationStatement
    | statement ';'
    ;

statement
    : declaration
    | expression
    | exitStatement
    | printStatement
    | returnStatement
    ;

declaration
    : LET ID ASSIGN expression                                                  # ConstantDeclaration
    | VAR ID ASSIGN expression                                                  # VariableDeclaration
    ;

conditionalStatement
    : ifStatement
    ;

ifStatement
    : IF expression compoundStatement (ELSE compoundStatement)?
    ;

exitStatement
    : EXIT
    ;

iterationStatement
    : WHILE expression compoundStatement                                        # WhileStatement
    | DO compoundStatement WHILE expression                                     # DoStatement
    | FOR ID IN expression TO expression (STEP expression)? compoundStatement   # ForStatement
    ;

printStatement
    : PRINT '(' (expression (',' expression)*)? ')'                             # SimplePrintStatement
    | PRINTF '(' STRING (',' expression)* ')'                                   # PrintFormatStatement
    ;

returnStatement
    : RETURN expression?
    ;

expression
    : expression '[' expression ']'                                             # Indexed
    | ID DEFAULT expression                                                     # DefaultValue
    | ID op=(INC | DEC)                                                         # Postfix
    | op=(ADD | SUB | NOT) expression                                           # Unary
    | op=(INC | DEC) ID                                                         # Prefix
    | expression op=(MUL | DIV | MOD) expression                                # Multiplicative
    | expression op=(ADD | SUB) expression                                      # Additive
    | expression op=(GT | GE | LT | LE) expression                              # Relational
    | expression op=(EQ | NE) expression                                        # Equality
    | expression AND expression                                                 # LogicalAnd
    | expression OR expression                                                  # LogicalOr
    | expression IF expression ELSE expression                                  # Ternary
    | ID '(' (expression (',' expression)*)? ')'                                # FunctionCall
    | '(' expression ')'                                                        # Parenthesized
    | '[' (expression (',' expression)* )? ']'                                  # LiteralArray
    | ID                                                                        # Identifier
    | NUMBER                                                                    # LiteralNumber
    | STRING                                                                    # LiteralString
    | BOOLEAN                                                                   # LiteralBoolean
    | ID ASSIGN expression                                                      # SimpleAssignment
    | ID op=(CADD | CSUB | CMUL | CDIV) expression                              # CompoundAssignment
    | ID '[' expression ']' ASSIGN expression                                   # IndexedAssignment
    ;

// ----------------------------------------------------------------------------
// Lexer
// ----------------------------------------------------------------------------

fragment
IDCHR           : [A-Za-z_$];

fragment
DIGIT           : [0-9];

fragment
ESC             : '\\' ["\\];

COMMENT         : '#' .*? '\n' -> skip;

// ----------------------------------------------------------------------------
// Keywords
// ----------------------------------------------------------------------------

DO              : 'do';
ELSE            : 'else';
EXIT            : 'exit';
FOR             : 'for';
FUNCTION        : 'function';
IF              : 'if';
IN              : 'in';
LET             : 'let';
PRINT           : 'print';
PRINTF          : 'printf';
RETURN          : 'return';
STEP            : 'step';
TO              : 'to';
VAR             : 'var';
WHILE           : 'while';

// ----------------------------------------------------------------------------
// Operators
// ----------------------------------------------------------------------------

ADD             : '+';
DIV             : '/';
MOD             : '%';
MUL             : '*';
SUB             : '-';

DEC             : '--';
INC             : '++';

ASSIGN          : '=';
CADD            : '+=';
CDIV            : '/=';
CMUL            : '*=';
CSUB            : '-=';

GE              : '>=';
GT              : '>';
LE              : '<=';
LT              : '<';

AND             : '&&';
EQ              : '==';
NE              : '!=';
NOT             : '!';
OR              : '||';

DEFAULT         : '??';

// ----------------------------------------------------------------------------
// Literals and identifiers
// ----------------------------------------------------------------------------

BOOLEAN         : ('true'|'false');
NUMBER          : DIGIT+ ('.' DIGIT+)?;
STRING          : '"' (ESC | .)*? '"';
ID              : IDCHR (IDCHR | DIGIT)*;

WHITESPACE      : [ \t\r\n] -> skip;
ANYCHAR         : . ;

So my question is where is the usual place to detect unused expression results, i.e. when expressions are used as plain statements? Is it something I should detect during the parse, then annotate the AST node? Or is this better done when visiting the AST for code generation (assembly generation in my case)? I just can't see where best to do it.

san-ho-zay
  • 170
  • 1
  • 7
  • Is that stack just used to evaluate expressions? If so, shouldn't it be limited to a single expression / erased when leaving expression? – Jiri Tousek May 10 '18 at 10:52
  • The operand stack is in the virtual machine and has no knowledge of expressions or statements, it just executes low-level instructions. The ADD instruction, for example, just pops two operands off the top of the stack and pushes the sum. It's up to the code generator to generate instructions that use the value or discard it. I did think about having a CLEAR instruction to clear down the operand stack but that doesn't feel the right solution. I'd still need to generate the CLEAR instruction, and that's basically the same problem as working out when to call POP. – san-ho-zay May 10 '18 at 13:18

2 Answers2

0

IMO it's not a question of the right grammar, but how you process the AST/parse tree. The fact if a result is used or not could be determined by checking the siblings (and parent siblings etc.). An assignment for instance is made of the lvalue, the operator and the rvalue, hence when you determined the rvalue, check the previous tree node sibling if that is an operator. Similarly you can check if the parent is a parentheses expression (for nested function calls, grouping etc.).

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • I'm not going to rule this out but my current AST design (which is still fluid) doesn't suit this approach. My AST nodes have references to their children but not back to their parents. I'm clearer now that I need to identify expressions used as statements and any expression node where the parent node is not an expression should be an expression statement. I just think identifying that in the grammar rather than looking up the AST might be the better solution for my current AST design. – san-ho-zay May 10 '18 at 13:47
  • If you'd use the generated parse tree instead of your own AST you would have that info at hand. Nodes have a reference to their parent. – Mike Lischke May 10 '18 at 16:51
0
statement
    : ...
    | expression

If you label this case with # ExpressionStatement, you can generate a pop after every expression statement by overriding exitExpressionStatement() in the listener or visitExpressionStatement in the visitor.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • Of course. Face meets palm. In practice, I'd need an ExpressionStatement node in the AST and generate the POP when walking the AST but that's just implementation detail. I'll try it before accepting the answer. – san-ho-zay May 10 '18 at 13:26