1

I am writing a parser for LaTeX mathematical formulas to convert them into MathML. So I wrote this grammar for Lemon.

%token BEGIN_GROUP END_GROUP MATH_SHIFT ALIGNMENT_TAB.
%token END_OF_LINE PARAMETER SUPERSCRIPT SUBSCRIPT.
%token SPACE LETTER DIGIT SYMBOL.
%token COMMAND COMMAND_LEFT COMMAND_RIGHT.
%token COMMAND_LIMITS COMMAND_NOLIMITS.
%token BEGIN_ENV END_ENV.
%token NBSP.

/* Some API */

document ::= list.

list ::= list element.
list ::= .

element ::= identifier(Id).
element ::= symbol(O).
element ::= number(Num).

identifier ::= LETTER.

symbol ::= SYMBOL.

number(N) ::= number DIGIT(D). /* Append digit */
number(N) ::= DIGIT(D). /* Init digits */

/* Lexer code */

This grammar is incomplete, it doesn't contains main program code. This is an output from Lemon parser:

State 2:
      (2) element ::= number *
          number ::= number * DIGIT

                         DIGIT shift-reduce 3      number ::= number DIGIT
                         DIGIT reduce       2       ** Parsing conflict **
                     {default} reduce       2      element ::= number

This grammar produces one parsing conflict. How can I resolve this conflict?

I am writing my parser for the first time so I don't have enough experience to solve this problem.

  • 1
    The conflict is because your grammar allows consecutive `element`s with nothing between them. Since an `element` could be an arbitrary number of digits, `34` could be parsed either as a single element containing the `number` (34) or as two consecutive `element`s, the first one containing the `number` 3 and the second one the `number` 4. Your best bet is to use a lexer to divide the input into tokens, because character-by-character grammars are rarely possible with a single character lookahead. But you should probably read at least some introductory material, even if it's just Wikipedia. – rici Jan 01 '23 at 15:57

0 Answers0