1

I am writing a parser for an existing language, using the TextX Python Library (based on the Arpeggio PEG parser)

But when I try to use it to parse a file, I get the exception:

RecursionError: maximum recursion depth exceeded while calling a Python object

Here is a minimal example that raises this exception:

#!/usr/bin/env python
from textx import metamodel_from_str

meta_model_string = "Expr: ( Expr '+' Expr ) | INT ;"
model_string      = "1 + 1"

mm = metamodel_from_str(meta_model_string, debug=True)
m = mm.model_from_str(model_string, debug=True)

I tracked it down to Arpeggio's left recursion issue, where it state that a rule like A := A B is unsupported and should be converted to a rule where there is no such recursion.

So my question is: Is it possible to rewrite the Expr := Expr '+' Expr rule above in a way that does not use left recursion? Note that the real Expr rule is much more complicated. A slightly less simplified version of it will be:

Expr: '(' Expr ')' | Expr '+' Expr | Expr '*' Expr' | '!' Expr | INT | STRING ;
Chen Levy
  • 15,438
  • 17
  • 74
  • 92

2 Answers2

4

textX author here. In addition to Paul's excellent answer, there is expression example which should provide you a good start.

Top-down parsers in general are not handling left-recursive rules without hacks like this. If your language is going to be complex and heavily expression oriented it might be better to try some bottom-up parser that allows for left recursion and provides declarative priority and associativity specification. If you liked textX then I suggest to take a look at parglare which has similar design goals but uses bottom-up parsing technique (specifically LR and GLR). Quick intro example is the exact language you are building.

In this post I blogged about rationale of starting parglare project and differences with textX/Arpeggio.

Igor Dejanović
  • 881
  • 5
  • 5
2

This is more typically written as:

multop: '*' | '/'
addop: '+' | '-'
Factor: INT | STRING | '(' Expr ')' ;
Term: Factor [multop Factor]... ;
Expr: Term [addop Term]... ;

Now Expr will not directly recurse to itself until first matching a leading '('. You will also get groups that correspond to precedence of operations. (Note that the repetition for Expr and Term will end up producing groups like ['1', '+', '1', '+', '1'], when you might have expected [['1', '+', '1'], '+', '1'] which is what a left-recursive parser will give you.)

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • 1
    This solution will not accept the original language specification, for example `1 + 1 + 1` will be rejected. – Chen Levy Oct 15 '18 at 11:52
  • Yes, thanks! I can add repetition in the rules, but then the parsed structures look like ['1', '+', '1', '+', '1'] when people typically expect [['1', '+', '1'], '+', '1']. If I change to something like Expr: Term addop Expr, then operations are incorrectly grouped right-associatively, and "1 - 2 + 3" gets incorrectly parsed as ["1", "-", ["2" + "3"]], which would incorrectly evaluate to -4, instead of the desired 2. To address your point, I'll add repetition, with the caveat – PaulMcG Oct 15 '18 at 22:05