Why is the ordered choice ignored of toChoiceParser() when adding a plus() parser?

Question

I am stuck at one point with the Dart package petitparser: It seems that the "priority rule" ("parse p1, if that doesn't work parse p2 - ordered choice") is ignored by the toChoiceParser() if a plus() parser is added.

import 'package:petitparser/petitparser.dart';

// This parser should check from left to right if a nestedTerm, e.g. '(0)' or '(()', exists.
// If this is not the case, then it looks if a singleCharacter exists, either '(', ')' or '0' (lower priority).
// In case 1 everything works perfectly. But if the process is repeated any number of times, as in case 2,
// then it seems that it no longer recognizes that a nestedTerm exists and that this should actually lead
// to the same terminal output as in case 1 due to the higher priority. Where is my fallacy?

void main() {
  final definition = ExpressionDefinition();
  final parser = definition.build();
  print(parser.parse('(0)').toString());
  // Terminal output in case 1: ['(' (nestedTerm), '0' (singleCharacter), ')' (nestedTerm)]
  // Terminal output in case 2: ['(' (singleCharacter), '0' (singleCharacter), ')' (singleCharacter)]
}

class ExpressionDefinition extends GrammarDefinition {
  @override
  Parser start() => ref0(term).end();
  // Case 1 (parses only once):
  Parser term() => ref0(nestedTerm) | ref0(singleCharacter);
  // Case 2 (parses one or more times):
     // Parser term() => (ref0(nestedTerm) | ref0(singleCharacter)).plus();
  Parser nestedTerm() =>
      (char('(')).map((value) => "'$value' (nestedTerm)") &
      ref0(term) &
      char(')').map((value) => "'$value' (nestedTerm)");
  Parser singleCharacter() =>
      char('(').map((value) => "'$value' (singleCharacter)") |
      char(')').map((value) => "'$value' (singleCharacter)") |
      char('0').map((value) => "'$value' (singleCharacter)");
}

However, for my current project, the "priority rule" should also work in this case (in this example case 2).

Can anyone find my fallacy? Thanks a lot for your support!

score 1 · Accepted Answer · answered Aug 06 '22 at 16:00

1

Probably the easiest way to understand what is going on is to compare the parse trace of the two parsers, see also the section on debugging grammars I recently added:

import 'package:petitparser/debug.dart';

void main() {
  ...
  trace(parser).parse('(0)');

You will see that in case 2 the nested-term is correctly started, but then for the inside of the nested-term the plus() parser eagerly consumes the remaining input characters 0 and ). This then causes the outer nested-term to fail because it cannot be completed with a ) anymore. As a consequence the complete input is consumed using single-characters.

From the examples given it is not entirely clear what you expect to get? Removing char(')') from the singleCharacter parser would solve issue described.

answered Aug 06 '22 at 16:00

Lukas Renggli

8,754
23
46

***From the examples given it is not entirely clear what you expect to get?*** I thought that the plus() parser repeats the process as often as possible *(but at least once)*, this is why I thought that *"ref0(nestedTerm) | ref0(singleCharacter)"* and *"ref0(nestedTerm) | ref0(singleCharacter).plus()"* are equivalent for an input like '(0)': **Output case 1:** ['(' (nestedTerm), '0' (singleCharacter), ')' (nestedTerm)]; **Output case 2:** ['(' (singleCharacter), '0' (singleCharacter), ')' (singleCharacter)]; **Expected output case 2:** same output like case 1 – CodingFun Aug 08 '22 at 10:23
***Removing char(')') from the singleCharacter parser would solve issue described.*** This is right. However, I would like to make a math converter *(AsciiMath ↔ LaTeX ↔ MathML ↔ ...)*. Two same single characters of an AsciiMath expression, like "())", could have a different meaning in LaTeX → **')'** or **'\right)'**. – CodingFun Aug 08 '22 at 10:24
Yes, your understanding of the `plus()` parser is correct. However, the repeating parsers consume as much as they can: after consuming the `0`, it also consumes the `)`, which then causes the nested term to fail because it cannot find corresponding `)` anymore. Have a look at the trace, it explains step by step what happens. – Lukas Renggli Aug 10 '22 at 21:16
Again, not exactly sure what you are trying to do, on asciimath.org I don't see an example of unbalanced parenthesis. In any case, you could maybe detect the double parenthesis either with `string('))')` or [lookahead](https://pub.dev/documentation/petitparser/latest/parser/AndParserExtension.html) to condition the parsers further? – Lukas Renggli Aug 10 '22 at 21:23
Thanks for your message and your helpful advice. In principle, I was not looking for this example explicitly, but more in general. Anyway, the brackets in AsciiMath are balanced as you said. However, I do not want to convert it one-to-one, rather fitting for my specific use case. For example, AsciiMath does not render the same everywhere, and it should automatically fix certain input errors. Moreover, the recognition software I use cannot differentiate between the sizes of the brackets. – CodingFun Aug 12 '22 at 12:30
As a result, a unique conversion is impossible, but it is not necessary for my project. I will definitely have another closer look at your parser - thank you so much for your efforts :)! – CodingFun Aug 12 '22 at 12:30

Why is the ordered choice ignored of toChoiceParser() when adding a plus() parser?

1 Answers1