How do efficiently differentiate between different tokens in a rule in ANTLR4?

Question

I have a simple grammar rule:

expr    : expr (EQUALS | NOT_EQUALS) expr
        | literal;
literal : ...; // omitted here

The lexer recognizes EQUALS and NOT_EQUALS:

EQUALS     : '=';
NOT_EQUALS : '!=';

In my code, I want to differentiate between the equals and not equals case. I am wondering how to do this efficiently. Currently, I implement the visitor as following:

public Expression visitExpr(ExprContext ctx) {
    if (ctx.EQUALS() != null) {
        return EqualsExpression.construct(ctx.expr());
    } else if (ctx.NOT_EQUALS() != null) {
        return NotEqualsExpression.construct(ctx.expr());
    } else if (ctx.literal() != null) {
        return LiteralExpression.construct(ctx.literal());
    }
    throw new IllegalStateException();
}

I am not sure if this is very efficient, since EQUALS()/NOT_EQUALS() calls getToken(), which basically loops through all children. This is done multiple times, so I am not sure if this is smart. Also, I call literal() twice. Regarding the latter, I know I could cache in a local variable, but this will end up in pretty ugly code rather quickly if there are multiple children rules to be considered.

Is there a way to do this more efficiently? A switch/case statement based on some kind of token identifier or branch identifier would be more ideal?

sidenote

I could split the expr rule into multiple rules as following:

expr         : expr_eq | expr_not_eq | expr_literal
expr_eq      : expr EQUALS expr
expr_not_eq  : expr NOT_EQUALS expr
expr_literal : literal

Now, every possible branch will be visited separately by the visitor:

public Expression visitExprEx(ExprEqContext ctx) {
    return EqualsExpression.construct(ctx.expr());
}

public Expression visitExprNotEq(ExprNotEqContext ctx) {
    return NotEqualsExpression.construct(ctx.expr());
}

public Expression visitExprLiteral(ExprLiteralContext ctx) {
    return LiteralExpression.construct(ctx.literal());
}

But looking at the G4 grammars on Github (https://github.com/antlr/grammars-v4), this is rarely done. So I am not sure if this is the way to go forward.

"how to do this efficiently" worry about efficiency when you've measured that that particular part of your code is a hotspot/problem. In your case, I'd go for so called _alternative labels_.See : https://github.com/antlr/antlr4/blob/master/doc/parser-rules.md#alternative-labels I'm pretty sure your 2 solutions and _alternative labels_ are all equally efficient. — Bart Kiers, Aug 05 '18 at 09:56

score 1 · Answer 1 · answered Aug 05 '18 at 10:26

Never guess performance just from looking at the code. Measure it!

The child lists are pretty short, which means getToken only does 1-2 loops before finding either EQUALS or NOT_EQUALS. Other parts of your code likely require much more time than that lookup.

However, if you want to get out every piece of performance avoid the convenience methods and do things manually, as you can optimize the access due to the knowlege you have about the grammar. In this special case your expr rule can only have one or 3 children. Looking at the first variant it can get a single literal rule child node or an expr rule child node, a token child for the inner alt and another expr rule child node. All you have to do is to check:

if (ctx.getChildCount() > 1 && ((TerminalNode)ctx.getChild(1)).getSymbol().getType() == YourParser.EQUALS)

How do efficiently differentiate between different tokens in a rule in ANTLR4?

1 Answers1