0

I am trying to make a grammar for SMT formulae and this is what I have so far

grammar Z3input;

startRule : formulaList? EOF;

LEFT_PAREN : '(';
RIGHT_PAREN : ')';
COMMA : ',';
SEMICOLON : ';';

PLUS : '+';
MINUS : '-';
TIMES : '*';
DIVIDE : '/';

DIGIT : [0-9];
INTEGER : '0' | [1-9] DIGIT*;
FLOAT : DIGIT+ '.' DIGIT+;
NUMERICAL_LITERAL : FLOAT | INTEGER;
BOOLEAN_LITERAL : 'True' | 'False';
LITERAL : MINUS? NUMERICAL_LITERAL | BOOLEAN_LITERAL;

COMPARISON_OPERATOR : '>' | '<' | '>=' | '<=' | '!=' | '==';
WHITESPACE: [ \t\n\r]+ -> skip;
IDENTIFIER : [a-uw-zB-DF-Z]+ ([a-zA-Z0-9]? [a-uw-zB-DF-Z])*; // omits 'v', 'A', 'E' and cannot end in those characters

IMPLIES : '->' | '-->' | 'implies';
AND : '&' | 'and' | '^';
OR : 'or' | 'v' | '|';
NOT : '~' | '!' | 'not';
QUANTIFIER : 'A' | 'E' | 'forall' | 'exists';

formulaList : formula ( SEMICOLON formula )*; 
argumentList : expression ( COMMA expression )*; 

formula : formulaConjunction 
        | LEFT_PAREN formula RIGHT_PAREN OR LEFT_PAREN formulaConjunction RIGHT_PAREN
        | formula IMPLIES LEFT_PAREN formulaConjunction RIGHT_PAREN;

formulaConjunction : formulaNegation | formulaConjunction AND         formulaNegation;
formulaNegation : formulaAtom | NOT formulaNegation;
formulaAtom : BOOLEAN_LITERAL 
        | IDENTIFIER ( LEFT_PAREN argumentList? RIGHT_PAREN )?
        | QUANTIFIER '.' LEFT_PAREN formulaAtom RIGHT_PAREN
        | compareExpn;

expression : boolConjunction | expression OR boolConjunction;
boolConjunction : boolNegation | boolConjunction AND boolNegation;
boolNegation : compareExpn | NOT boolNegation;

compareExpn : arithExpn COMPARISON_OPERATOR arithExpn;
arithExpn : term | arithExpn PLUS term | arithExpn MINUS term;

term : factor | term TIMES factor | term DIVIDE factor;
factor : primary | MINUS factor;

primary : LITERAL 
        | IDENTIFIER ( LEFT_PAREN argumentList? RIGHT_PAREN )? 
        | LEFT_PAREN expression RIGHT_PAREN;

SMT formulae are formulae of first-order logic with function symbols (identifiers which can be called with however many arguments), variables, comparison of either boolean literals (I.e. 'True' or 'False') or numeric literals or function calls or variables, arithmetic with operators '+', '*', '-', and '/'. Essentially these formulae are first-order logic over some signature and for my purposes I've chosen for this signature to be the theory of rationals.

I can get a proper interpretation of something like 'True ^ True' but anything more complicated, including even 'True | True', seems to always result in something along the lines of

... mismatched input '|' expecting {<EOF>, ';', IMPLIES, AND}

so I would like some help with correcting the grammar. And for the record I would prefer to keep the grammar run-time independent.

  • The right hint was already posted by @AnthonyZhang. As general hint I would suggest not to use these left recursive patterns. `formulaConjunction : formulaNegation | formulaConjunction AND formulaNegation;` is equivalent to `formulaConjunction : formulaNegation (AND formulaNegation)*;`, but the latter is more concise and not left recursive. Most of your rules can be transformed this way. – CoronA May 07 '15 at 05:15
  • Thank you for the additional suggestion, but I am going to keep my initial form for `formulaConjunction` for left-associativity – Francesco Gramano May 07 '15 at 16:31

1 Answers1

1

Your formula rule seems to be causing the issue here: LEFT_PAREN formula RIGHT_PAREN OR LEFT_PAREN formulaConjunction RIGHT_PAREN.

That's saying that only formulas of the form (FORMULA)|(CONJUNCTIVE) will be accepted by the language.

Instead, specify precedence rules for each operator, and use a nonterminal for each level of precedence. For example, your grammar might look something like the following:

formula            : (QUANTIFIER IDENTIFIER '.')? formulaImplication;
formulaImplication : formulaConjunction (IMPLIES formula)?;
formulaConjunction : formulaDisjunction (AND formulaConjunction)?;
formulaDisjunction : formulaNegation (OR formulaDisjunction)?;
formulaNegation    : formulaAtom | NOT formulaNegation;
formulaAtom        : BOOLEAN_LITERAL | IDENTIFIER ( LEFT_PAREN argumentList? RIGHT_PAREN )? | LEFT_PAREN formula RIGHT_PAREN | compareExpn;

expression : boolConjunction | expression OR boolConjunction;
boolConjunction : boolNegation | boolConjunction AND boolNegation;
boolNegation : compareExpn | NOT boolNegation;

compareExpn : arithExpn COMPARISON_OPERATOR arithExpn;
arithExpn : term | arithExpn PLUS term | arithExpn MINUS term;

term : factor ((TIMES | DIVIDE) term)?;
factor : primary | MINUS factor;
primary : LITERAL | IDENTIFIER ( LEFT_PAREN argumentList? RIGHT_PAREN )? | LEFT_PAREN expression RIGHT_PAREN;
  • I agree with your conclusion, but your grammar could be improved: `formulatConjunction` and `formulaDisjunction` contain parentheses without semantics. You probably mean `(...)?` instead of `(...)`. – CoronA May 07 '15 at 05:13