0

I've been tasked with creating a grammar for a legacy DSL that's been in use for over 20 years. The original parser was written using a mess of regular expressions, so I've been told.

The syntax is generally of the "if this variable is n then set that variable to m" style.

My grammar works for almost all cases, but there are a few places where it baulks because of a (mis)use of the && (logical and) operator.

My Lark grammar (which is LALR(1)) is:

?start: statement*

?statement: expression ";"

?expression : assignment_expression

?assignment_expression : conditional_expression
                       | primary_expression assignment_op assignment_expression

?conditional_expression : logical_or_expression
                        | logical_or_expression "?" expression (":" expression)?

?logical_or_expression : logical_and_expression
                       | logical_or_expression "||" logical_and_expression

?logical_and_expression : equality_expression
                        | logical_and_expression "&&" equality_expression

?equality_expression : relational_expression
                     | equality_expression equals_op relational_expression
                     | equality_expression not_equals_op relational_expression

?relational_expression : additive_expression
                       | relational_expression less_than_op additive_expression
                       | relational_expression greater_than_op additive_expression
                       | relational_expression less_than_eq_op additive_expression
                       | relational_expression greater_than_eq_op additive_expression

?additive_expression : multiplicative_expression
                     | additive_expression add_op multiplicative_expression
                     | additive_expression sub_op multiplicative_expression

?multiplicative_expression : primary_expression
                           | multiplicative_expression mul_op primary_expression
                           | multiplicative_expression div_op primary_expression
                           | multiplicative_expression mod_op primary_expression

?primary_expression : variable
                    | variable "[" INT "]"    -> array_accessor
                    | ESCAPED_STRING
                    | NUMBER
                    | unary_op expression
                    | invoke_expression
                    | "(" expression ")"

invoke_expression : ID ("." ID)* "(" argument_list? ")"
argument_list : expression ("," expression)*

unary_op : "-" -> negate_op
         | "!" -> invert_op
assignment_op : "="
add_op : "+"
sub_op : "-"
mul_op : "*"
div_op : "/"
mod_op : "%"
equals_op : "=="
not_equals_op : "!="
greater_than_op : ">"
greater_than_eq_op : ">="
less_than_op : "<"
less_than_eq_op : "<="

ID : CNAME | CNAME "%%" CNAME

?variable : ID
    | ID "@" ID           -> namelist_id
    | ID "@" ID "@" ID    -> exptype_id
    | "$" ID              -> environment_id

%import common.WS
%import common.ESCAPED_STRING
%import common.CNAME
%import common.INT
%import common.NUMBER
%import common.CPP_COMMENT

%ignore WS
%ignore CPP_COMMENT

And some working examples are:

(a == 2) ? (c = 12);
(a == 2 && b == 3) ? (c = 12);
(a == 2 && b == 3) ? (c = 12) : d = 13;
(a == 2 && b == 3) ? ((c = 12) && (d = 13));

But there are a few places where I see this construct:

(a == 2 && b == 3) ? (c = 12 && d = 13);

That is, the two assignments are joined by && but aren't in parentheses and it doesn't like the second assignment operator. I assume this is because it's trying to parse it as (c = (12 && d) = 13)

I've tried changing the order of the rules (this is my first non-toy DSL, so there's been a lot of trial and error), but I either get similar errors or the precedence is wrong. And the Earley algorithm doesn't fix it.

MerseyViking
  • 389
  • 3
  • 19
  • 1
    Moving the ``assignment_expression`` between ``relation_expression`` and `additive_expression` does not fix it? How does the grammar look after you tried that? – MegaIng Jan 04 '23 at 15:10
  • 1
    Does the language allow `||` between assignments or only `&&`? – rici Jan 04 '23 at 15:13
  • @MegaIng Sadly it doesn't make a difference - it still doesn't like the second assignment. – MerseyViking Jan 04 '23 at 15:54
  • @rici I've not looked at all of the extant code, but it seems like only `&&` is used between assignments. – MerseyViking Jan 04 '23 at 15:55
  • 1
    @MerseyViking: can't you experiment with the existing code? The fact that a construct is not used doesn't necessarily make it illegal, and this problem will be much harder if `||` is not allowed. – rici Jan 04 '23 at 15:58
  • @rici Because there aren't any formal rules written down (that still exist anyway), it's hard to know what's allowed and what's not. The existing code is many thousands of lines accumulated by dozens of people over decades. That said, I can't find any *use* of `||` so I think I could argue to the client that it is allowed. In this case `&&` is only used because the DSL lacks code blocks, so it was just a way of concatenating multiple statements; the result of the logic operation is ignored. I'd be interested in hearing any suggestions you have, assuming `||` is effectively equivalent to `&&`. – MerseyViking Jan 04 '23 at 16:04

1 Answers1

0

Instead of:

?assignment_expression : conditional_expression
                       | primary_expression assignment_op assignment_expression

?conditional_expression : logical_or_expression
                        | logical_or_expression "?" expression (":" expression)?

?logical_or_expression : logical_and_expression
                       | logical_or_expression "||" logical_and_expression

?logical_and_expression : equality_expression
                        | logical_and_expression "&&" equality_expression

?equality_expression : relational_expression
                     | equality_expression equals_op relational_expression
                     | equality_expression not_equals_op relational_expression

?relational_expression : additive_expression
                       | relational_expression less_than_op additive_expression
                       | relational_expression greater_than_op additive_expression
                       | relational_expression less_than_eq_op additive_expression
                       | relational_expression greater_than_eq_op additive_expression

?additive_expression : multiplicative_expression
                     | additive_expression add_op multiplicative_expression
                     | additive_expression sub_op multiplicative_expression

?multiplicative_expression : primary_expression
                           | multiplicative_expression mul_op primary_expression
                           | multiplicative_expression div_op primary_expression
                           | multiplicative_expression mod_op primary_expression

try:

?assignment_expression : conditional_expression
                       | primary_expression assignment_op expression

?conditional_expression : logical_or_expression
                        | logical_or_expression "?" expression (":" expression)?

?logical_or_expression : logical_and_expression
                       | logical_or_expression "||" expression

?logical_and_expression : equality_expression
                        | logical_and_expression "&&" expression

?equality_expression : relational_expression
                     | equality_expression equals_op expression
                     | equality_expression not_equals_op expression

?relational_expression : additive_expression
                       | relational_expression less_than_op expression
                       | relational_expression greater_than_op expression
                       | relational_expression less_than_eq_op expression
                       | relational_expression greater_than_eq_op expression

?additive_expression : multiplicative_expression
                     | additive_expression add_op expression
                     | additive_expression sub_op expression

?multiplicative_expression : primary_expression
                           | multiplicative_expression mul_op expression
                           | multiplicative_expression div_op expression
                           | multiplicative_expression mod_op expression
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks for the help @Bart, but that parses this statement: `($STATUS=1 && $ERR=$NO_BUILD_EXPERIMENT_DIR)` as `($STATUS=(1 && $ERR=$NO_BUILD_EXPERIMENT_DIR))` so `$STATUS` is set to a boolean. It's actually a moot point now, I've agreed with the customer this morning that the code will be fixed rather than modifying the grammar to accommodate 9 ambiguous statements out of a total of 3300. – MerseyViking Jan 09 '23 at 12:29
  • Ah, yes, operator precedence changes then... – Bart Kiers Jan 09 '23 at 13:06