0

I have something equivalent to the following code:

STRING_LITERAL: /"[^"]*"/
NUMERIC_LITERAL: /[0-9]+/

stmt: foo | bar | baz
foo: "FOO " any_expression
bar: "BAR " string_expression
baz: "BAZ " numeric_expression

?any_expression: function | STRING_LITERAL | NUMERIC_LITERAL

?string_expression: function | STRING_LITERAL
?numeric_expression: function | NUMERIC_LITERAL

function: /[a-z]+\(\)/

This works, but there's code duplication. STRING_LITERAL and NUMERIC_LITERAL are mentioned in two places. This gets worse as more types of string expressions and numeric expressions are added, all of which need to go in any_expression as well. I'd like to replace the any_expression line with this one:

?any_expression: string_expression | numeric_expression

In other words, any string_expression or numeric_expression can be used where any_expression is expected. The problem is: function now appears twice, and LALR worries about which one it should use, resulting in the following error, despite the final result not including any intermediate step (due to the question marks in front of the rules) which means the final parsing result is not ambiguous:

lark.exceptions.GrammarError: Reduce/Reduce collision in Terminal('$END') between the following rules:
        - <numeric_expression : function>
        - <string_expression : function>

This code however works, suggesting that the problem isn't that function appears twice, but that LALR worries about the intermediate rules despite them being hidden in the final result:

?any_expression: function | STRING_LITERAL | function | NUMERIC_LITERAL

There are workarounds but they complicate the code. Is there some way to just tell lark to make a new rule that is the combination of two rules?

FrederikVds
  • 551
  • 4
  • 11
  • 1
    `?` is only a postprocessing step, it can't have any effect on the parsing. There currently is no way to not have some code duplicated. – MegaIng Feb 28 '23 at 19:33

0 Answers0