How can I tell ANTLR to use only the branch I specify, not other branches。

Question

Here is my ANTLR grammar： It is divided into two section ,parameters and constraints; The parameters section consists of many row,Each rowrepresents a parameter and its values.Each parameter and its values are separated by : . Each parameter value is separated by a ,.

The grammar of the constraints section was given by pict's github repository pict's github repository, I converted it into ANTLR grammar format.

grammar Pict;
model:parameters? constraints?;
//The part of Parameters and Values of Parameters
parameters:parameterRow+ '\n'*;
parameterRow: ' '* parameterName  SEMI  parameterValue (',' ' '* parameterValue)* '\n'*;
parameterName: Value ;
parameterValue:NUMBER|Value;

//The part of submodel
//submodel:;

//The part of constraints
constraints: constraint+ '\n'*;
constraint:(predicate ';'? '\n'*)|((IF|IFNOT) predicate THEN predicate (ELSE predicate)?) ';'? '\n'*;
predicate:
clause
|(clause LogicalOperator predicate)
;
clause:term
|'(' ' '* predicate ' '* ')'
|NOT predicate
;

term:
'['parameterName']' ' '* IN ' '*  '{' ' '* (String|NUMBER) ' '* (',' ' '* (NUMBER|String))* ' '* '}' #inStatment
|'['parameterName']' ' '* Relation ' '* (NUMBER|String) #relationValueStatement
| '['parameterName']' ' '* LIKE' '*  (NUMBER|String) #likeStatement
|'['parameterName']' ' '* Relation ' '* '['parameterName']'#relationParaStatement
;




SEMI:[ ]*':'[ ]* {setText(getText().trim());};
IN: ([ ]* 'in' [ ]* | [ ]* 'IN' [ ]*) {setText(getText().trim());};
LIKE:([ ]* ('LIKE'|'like') [ ]*) {setText(getText().trim());};
Relation:  ('='|'<>'|'>'|'>='|'<'|'<=' ) {setText(getText().trim());};
IF:[ '\n']* ('IF'|'if') [ '\n']*;
IFNOT:[ '\n']* ('IF NOT'|'if not') [ '\n']*;
THEN:[ '\n']* ('THEN'|'then') [ '\n']*;
ELSE:[ '\n']* ('ELSE'|'else') [ '\n']*;
NOT:[ '\n']* ('NOT'|'not') [ '\n']*;
LogicalOperator:([ '\n']* ('and'|'AND') [ '\n']*)|([ '\n']* ('OR'|'or') [ '\n']*) {setText(getText().trim());};

NUMBER
    :   '-'? INT '.' INT EXP?   // 1.35, 1.35E-9, 0.3, -4.5
    |   '-'? INT EXP            // 1e10 -3e4
    |   '-'? INT             // -3, 45
    ;

Value:LETTERNoWhiteSpace[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*(' ')?[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*{setText(getText().trim());};



String:('"' .*? '"') {setText(getText().trim());};
WS:[ \t\r\n]+ -> skip ;
COMMENT: '#' .*? '\n' ->skip;
fragment INT :   '0' | '1'..'9' '0'..'9'* ; // no leading zeros
fragment EXP :   [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
fragment
LETTERNoWhiteSpace:[-a-zA-Z\u4e00-\u9fa5_0-9];

For the lexical rule Value ,I need it to match all English and Chinese, as well as all English punctuation and Chinese punctuation,So I used unicode,start with \u to do it.

My input is:

Size:  1, 2, 3, 4, 5
Value: a, b, c, d

IF [Size] > 3 THEN [Value] > "b";

and ANTLR reports that:

line 4:12 no viable alternative at input '[Size] > 3 THEN'

Syntax Tree right here

I found that 3 THEN is matched by lexical rule Value,but I want 3 to be matched by rule Number or String like my grammar above ,and THEN is a keyword,it should not be matched.

How can I change my grammar to solve this problem?Thanks!

Sir I comment your pull-request,there is a litter bug in your grammar:https://github.com/antlr/grammars-v4/pull/2892 — Fane Xiang, Oct 20 '22 at 07:27

score 0 · Accepted Answer · answered Oct 18 '22 at 12:52

It's probably going to help to clean things up a bit (will make things easier to digest).

Most obvious: You have a WS rule with a skip action so you can drop all of the [ ]* (and similar) stuff. This also means you don't need the {setText(getText().trim());} stuff.
You can use options { caseInsensitive = true; } to avoid things like IF: ('IF' | 'if');
a | in a set ([abd|c]) is the actual | character, not an or operator. so you don't want stuff like \uff0c|\u3001|\uff1b|\uff1a (should be \uff0c\u3001\uff1b\uff1a)

This gives you:

grammar Pict
    ;

options {
    caseInsensitive = true;
}

model: parameterRow* constraint*;
//The part of Parameters and Values of Parameters parameters: parameterRow;
parameterRow
    : parameterName COLON parameterValue (',' parameterValue)*
    ;
parameterName:  Value;
parameterValue: NUMBER | Value;

//The part of submodel submodel:;

//The part of constraints constraints: constraint+;
constraint
    : predicate ';'?
    | (IF | IFNOT) predicate THEN predicate (ELSE predicate)? ';'?
    ;
predicate: clause | (clause LogicalOperator predicate);
clause:    term | '(' predicate ')' | NOT predicate;

term
    : '[' parameterName ']' IN ' {' (String | NUMBER) (
        ',' (NUMBER | String)
    )* '}'                                                 # inStatment
    | '[' parameterName ']' Relation (NUMBER | String)     # relationValueStatement
    | '[' parameterName ']' LIKE (NUMBER | String)         # likeStatement
    | '[' parameterName ']' Relation '[' parameterName ']' # relationParaStatement
    ;

COLON: ':';
IN:    'in';

LIKE:     'like';
Relation: ('=' | '<>' | '>' | '>=' | '<' | '<=');

IF:              'if';
IFNOT:           'if not';
THEN:            'then';
ELSE:            'else';
NOT:             'not';
LogicalOperator: ('and' | 'or');

NUMBER
    : '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
    | '-'? INT EXP // 1e10 -3e4
    | '-'? INT // -3, 45
    ;

Value
    : LETTERNoWhiteSpace
        [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
        (
        ' '?
            [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
    )*
    ;

String:       ('"' .*? '"') {setText(getText().trim());};
WS:           [ \t\r\n]+   -> skip;
COMMENT:      '#' .*? '\n' -> skip;
fragment INT: '0' | '1' ..'9' '0' ..'9'*; // no leading zeros
fragment EXP
    : 'e' [+\-]? INT
    ; // \- since - means "range" inside [...]
fragment LETTERNoWhiteSpace: [a-z\u4e00-\u9fa5_0-9];

With the following errors for your input...

line 2:7 token recognition error at: 'a,'
line 2:10 token recognition error at: 'b,'
line 2:13 token recognition error at: 'c,'
line 2:16 token recognition error at: 'd\n'
line 4:0 missing {NUMBER, Value} at 'IF'

so we can see that your Value rule doesn't recognize single letter values. If you modify it it to:

Value
    : LETTERNoWhiteSpace (
        [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
            (
            ' '?
                [-.?!a-z\u4e00-\u9fa5_0-9\u3002\uff1f\uff01\uff0c\u3001\uff1b\uff1a\u201c\u201d\u2018\u2019\uff08\uff09\u300a\u300b\u3008\u3009\u3010\u3011\u300e\u300f\u300c\u300d\ufe43\ufe44\u3014\u3015\u2026\u2014\uff5e\ufe4f\uffe5]
        )*
    )?
    ;

(Note: This rule is quite complex, and, by allowing embedded spaces, is likely to cause some problems with tokenization in more complex examples than yours, but it works fine for your sample input.)

Then there are no errors and you get the following tree:

Sir,you successfully solved my problem and your answers helped me a lot and made me understand where my mistake occurred. I would like to sincerely thank you for your answer and thank you for your time and effort. But I got another question after switching to your grammar, it would be great if you could help me again to see why the error appears.Again but not the last time, thanks for your help, your answer really helped me as a newbie to ANTLR — Fane Xiang, Oct 19 '22 at 07:00
"But I got another question after switching to your grammar, it would be great if you could help me again to see why the error appears" like I previously suggested: take the time to understand what you are doing. The way you're trying to solve things now is not likely going to succeed without a better knowledge of ANTLR. — Bart Kiers, Oct 19 '22 at 07:23
If you have a new question, please just ask a new one on stackoverflow and do not append a completely new question to this existing question. — Bart Kiers, Oct 19 '22 at 07:27
I’ll second Bart (again). Questions that are continuously amended by “the next question” result in “questions” and “answers” that are difficult for future readers to follow and miss one of the main points of SO. Better to revert the edit adding another question (the answer now, doesn’t answer the edited question) and then just ask a new question altogether. — Mike Cargal, Oct 19 '22 at 11:27

How can I tell ANTLR to use only the branch I specify, not other branches。

1 Answers1