6

This is a demo code

label:
var id
let id = 10
goto label

If allowed keyword as identifier will be

let:
var var
let var = 10
goto let

This is totally legal code. But it seems very hard to do this in antlr.

AFAIK, If antlr match a token let, will never fallback to id token. so for antlr it will see

LET_TOKEN :
VAR_TOKEN <missing ID_TOKEN>VAR_TOKEN
LET_TOKEN <missing ID_TOKEN>VAR_TOKEN = 10

although antlr allowed predicate, I have to control ever token match and problematic. grammar become this

grammar Demo;
options {
  language = Go;
}
@parser::members{
    var _need = map[string]bool{}
    func skip(name string,v bool){
        _need[name] = !v
        fmt.Println("SKIP",name,v)
    }
    func need(name string)bool{
        fmt.Println("NEED",name,_need[name])
        return _need[name]
    }
}

proj@init{skip("inst",false)}: (line? NL)* EOF;
line
    : VAR ID
    | LET ID EQ? Integer
    ;

NL: '\n';
VAR: {need("inst")}? 'var' {skip("inst",true)};
LET: {need("inst")}? 'let' {skip("inst",true)};
EQ: '=';

ID: ([a-zA-Z] [a-zA-Z0-9]*);
Integer: [0-9]+;

WS: [ \t] -> skip;

Looks so terrible.

But this is easy in peg, test this in pegjs

Expression = (Line? _ '\n')* ;

Line
  = 'var' _ ID
  / 'let' _ ID _ "=" _ Integer

Integer "integer"
  = [0-9]+ { return parseInt(text(), 10); }

ID = [a-zA-Z] [a-zA-Z0-9]*

_ "whitespace"
  = [ \t]*

I actually done this in peggo and javacc.

My question is how to handle these grammars in antlr4.6, I was so excited about the antlr4.6 go target, but seems I choose the wrong tool for my grammar ?

wener
  • 7,191
  • 6
  • 54
  • 78

1 Answers1

8

The simplest way is to define a parser rule for identifiers:

id: ID | VAR | LET;

VAR: 'var';
LET: 'let';
ID: [a-zA-Z] [a-zA-Z0-9]*;

And then use id instead of ID in your parser rules.

A different way is to use ID for identifiers and keywords, and use predicates for disambiguation. But it's less readable, so I'd use the first way instead.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • 1
    Unfortunately in this situation, when an error message is generated by antlr4, it tells the user that either ID, var or let is expected, which is confusing. Do you know if there a way to get antlr4 to tell the user that only ID is expected? – user1241663 Jan 15 '20 at 17:11
  • @user1241663 good point. I never really tried to customize error messages, but I suppose you could supply your own `IParserErrorListener` to tweak the generated errors. I don't know if there's a better way. – Lucas Trzesniewski Jan 15 '20 at 21:40
  • Extending `DefaultErrorStrategy` lets me customize the error message. – user1241663 Jan 17 '20 at 03:12