10

Pyparsing worked fine for a very small grammar, but as the grammar has grown, the performance went down and the memory usage through the roof.

My current gramar is:

newline = LineEnd ()
minus = Literal ('-')
plus = Literal ('+')
star = Literal ('*')
dash = Literal ('/')
dashdash = Literal ('//')
percent = Literal ('%')
starstar = Literal ('**')
lparen = Literal ('(')
rparen = Literal (')')
dot = Literal ('.')
comma = Literal (',')
eq = Literal ('=')
eqeq = Literal ('==')
lt = Literal ('<')
gt = Literal ('>')
le = Literal ('<=')
ge = Literal ('>=')
not_ = Keyword ('not')
and_ = Keyword ('and')
or_ = Keyword ('or')
ident = Word (alphas)
integer = Word (nums)

expr = Forward ()
parenthized = Group (lparen + expr + rparen)
trailer = (dot + ident)
atom = ident | integer | parenthized
factor = Forward ()
power = atom + ZeroOrMore (trailer) + Optional (starstar + factor)
factor << (ZeroOrMore (minus | plus) + power)
term = ZeroOrMore (factor + (star | dashdash | dash | percent) ) + factor
arith = ZeroOrMore (term + (minus | plus) ) + term
comp = ZeroOrMore (arith + (eqeq | le | ge | lt | gt) ) + arith
boolNot = ZeroOrMore (not_) + comp
boolAnd = ZeroOrMore (boolNot + and_) + boolNot
boolOr = ZeroOrMore (boolAnd + or_) + boolAnd
match = ZeroOrMore (ident + eq) + boolOr
expr << match
statement = expr + newline
program = OneOrMore (statement)

When I parse the following

print (program.parseString ('3*(1+2*3*(4+5))\n') )

It takes quite long:

~/Desktop/m2/pyp$ time python3 slow.py 
['3', '*', ['(', '1', '+', '2', '*', '3', '*', ['(', '4', '+', '5', ')'], ')']]

real    0m27.280s
user    0m25.844s
sys 0m1.364s

And the memory usage goes up to 1.7 GiB (sic!).

Have I made some serious mistake implementing this grammar or how else can I keep memory usage in bearable margins?

Hyperboreus
  • 31,997
  • 9
  • 47
  • 87

1 Answers1

13

After importing pyparsing enable packrat parsing to memoize parse behavior:

ParserElement.enablePackrat()

This should make a big improvement in performance.

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • 2
    For the record, this goes from 3.5 seconds to 0.036 seconds on my computer, nearly a 100-fold improvement. Is there any reason why memoization isn't turned on automatically - does it fail in some edge cases? – Hooked Jan 27 '14 at 15:44
  • 2
    @Hooked: For details on packrat parsing with pyparsing, see [this item in the pyparsing FAQ](http://pyparsing-public.wikispaces.com/FAQs#toc3). See also [this SO thread](https://stackoverflow.com/q/1410477/857390) for packrat parsing in general. – Florian Brucker Jul 07 '14 at 09:47
  • With the help of Tal Einat, packrat parsing underwent a big rewrite this past summer, released in version 2.1.6. Increase in speed AND big decrease in memory consumption! – PaulMcG Nov 01 '16 at 17:56