Here is my ANTLR grammar:
It is divided into two section ,parameters
and constraints
;
The parameters
section consists of many row,Each rowrepresents a parameter and its values.Each parameter and its values are separated by :
. Each parameter value is separated by a ,
.
The grammar of the constraints section was given by pict's github repository pict's github repository, I converted it into ANTLR grammar format.
grammar Pict;
model:parameters? constraints?;
//The part of Parameters and Values of Parameters
parameters:parameterRow+ '\n'*;
parameterRow: ' '* parameterName SEMI parameterValue (',' ' '* parameterValue)* '\n'*;
parameterName: Value ;
parameterValue:NUMBER|Value;
//The part of submodel
//submodel:;
//The part of constraints
constraints: constraint+ '\n'*;
constraint:(predicate ';'? '\n'*)|((IF|IFNOT) predicate THEN predicate (ELSE predicate)?) ';'? '\n'*;
predicate:
clause
|(clause LogicalOperator predicate)
;
clause:term
|'(' ' '* predicate ' '* ')'
|NOT predicate
;
term:
'['parameterName']' ' '* IN ' '* '{' ' '* (String|NUMBER) ' '* (',' ' '* (NUMBER|String))* ' '* '}' #inStatment
|'['parameterName']' ' '* Relation ' '* (NUMBER|String) #relationValueStatement
| '['parameterName']' ' '* LIKE' '* (NUMBER|String) #likeStatement
|'['parameterName']' ' '* Relation ' '* '['parameterName']'#relationParaStatement
;
SEMI:[ ]*':'[ ]* {setText(getText().trim());};
IN: ([ ]* 'in' [ ]* | [ ]* 'IN' [ ]*) {setText(getText().trim());};
LIKE:([ ]* ('LIKE'|'like') [ ]*) {setText(getText().trim());};
Relation: ('='|'<>'|'>'|'>='|'<'|'<=' ) {setText(getText().trim());};
IF:[ '\n']* ('IF'|'if') [ '\n']*;
IFNOT:[ '\n']* ('IF NOT'|'if not') [ '\n']*;
THEN:[ '\n']* ('THEN'|'then') [ '\n']*;
ELSE:[ '\n']* ('ELSE'|'else') [ '\n']*;
NOT:[ '\n']* ('NOT'|'not') [ '\n']*;
LogicalOperator:([ '\n']* ('and'|'AND') [ '\n']*)|([ '\n']* ('OR'|'or') [ '\n']*) {setText(getText().trim());};
NUMBER
: '-'? INT '.' INT EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
Value:LETTERNoWhiteSpace[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*(' ')?[-.?!a-zA-Z\u4e00-\u9fa5_0-9\u3002|\uff1f|\uff01|\uff0c|\u3001|\uff1b|\uff1a|\u201c|\u201d|\u2018|\u2019|\uff08|\uff09|\u300a|\u300b|\u3008|\u3009|\u3010|\u3011|\u300e|\u300f|\u300c|\u300d|\ufe43|\ufe44|\u3014|\u3015|\u2026|\u2014|\uff5e|\ufe4f|\uffe5]*{setText(getText().trim());};
String:('"' .*? '"') {setText(getText().trim());};
WS:[ \t\r\n]+ -> skip ;
COMMENT: '#' .*? '\n' ->skip;
fragment INT : '0' | '1'..'9' '0'..'9'* ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
fragment
LETTERNoWhiteSpace:[-a-zA-Z\u4e00-\u9fa5_0-9];
For the lexical rule Value
,I need it to match all English and Chinese, as well as all English punctuation and Chinese punctuation,So I used unicode,start with \u
to do it.
My input is:
Size: 1, 2, 3, 4, 5
Value: a, b, c, d
IF [Size] > 3 THEN [Value] > "b";
and ANTLR reports that:
line 4:12 no viable alternative at input '[Size] > 3 THEN'
I found that 3 THEN
is matched by lexical rule Value
,but I want 3
to be matched by rule Number
or String
like my grammar above ,and THEN
is a keyword,it should not be matched.
How can I change my grammar to solve this problem?Thanks!