I have something equivalent to the following code:
STRING_LITERAL: /"[^"]*"/
NUMERIC_LITERAL: /[0-9]+/
stmt: foo | bar | baz
foo: "FOO " any_expression
bar: "BAR " string_expression
baz: "BAZ " numeric_expression
?any_expression: function | STRING_LITERAL | NUMERIC_LITERAL
?string_expression: function | STRING_LITERAL
?numeric_expression: function | NUMERIC_LITERAL
function: /[a-z]+\(\)/
This works, but there's code duplication. STRING_LITERAL
and NUMERIC_LITERAL
are mentioned in two places. This gets worse as more types of string expressions and numeric expressions are added, all of which need to go in any_expression
as well. I'd like to replace the any_expression
line with this one:
?any_expression: string_expression | numeric_expression
In other words, any string_expression
or numeric_expression
can be used where any_expression
is expected. The problem is: function
now appears twice, and LALR worries about which one it should use, resulting in the following error, despite the final result not including any intermediate step (due to the question marks in front of the rules) which means the final parsing result is not ambiguous:
lark.exceptions.GrammarError: Reduce/Reduce collision in Terminal('$END') between the following rules:
- <numeric_expression : function>
- <string_expression : function>
This code however works, suggesting that the problem isn't that function
appears twice, but that LALR worries about the intermediate rules despite them being hidden in the final result:
?any_expression: function | STRING_LITERAL | function | NUMERIC_LITERAL
There are workarounds but they complicate the code. Is there some way to just tell lark to make a new rule that is the combination of two rules?