I realize that this question is a bit stale at this point, but I stumbled across it as I was in the process of learning tree-sitter
myself and it seemed like an interesting challenge.
Based on the expected output provided, it appears that the intent of the operator
expression is to consume all expressions that are separated by \X
tokens. The grammar is ambiguous because it defines the operator
expression itself as one of the expressions that could be consumed by an operator
expression. As a result, it is impossible for the parser to figure out how it should group the expression sequence.
You can convince tree-sitter
to generate a valid parser by applying precedence and associativity, but the best you can accomplish with this is to force the parser to break the sequence up into a series of operator
expressions where each expression has at most one \X
operator token. The operator
rule is a visible node so, instead of the expected result, you end up with something like the following (for a right-associative operator):
(source_file
(operator
(operator
(operator
(identifier)
(identifier))
(identifier))
(identifier)))
However, the expected result indicates that the operator
expression should never nest at all, but should produce a single operator
containing a list of all the delimited expressions. The implication is that the \X
tokens aren't actually delimiting any expression, as currently defined, but rather any expression other than another operator
expression.
Therefore, the simplest solution to the ambiguity seems to be to separate the expressions into two types: the operator
expression, and all other "non-operator" expressions. You can then define the operator
expression so that it only repeats the non-operator expressions. In my testing, the following grammar rules will produce the expected output.
source_file: $ => $._expression,
_expression: $ => choice(
$.operator,
$._non_operator_expression,
),
_non_operator_expression: $ => choice(
$.identifier,
// [maybe others]
),
operator: $ => seq(
repeat1(seq($._non_operator_expression, '\\X')),
$._non_operator_expression,
)