5

I'm trying to use instaparse on a dimacs file less than 700k in size, with the following grammar

<file>=<comment*> <problem?> clause+
comment=#'c.*'
problem=#'p\s+cnf\s+\d+\s+\d+\s*'
clause=literal* <'0'>
<literal>=#'[1-9]\d*'|#'-\d+'

calling like so

(def parser
  (insta/parser (clojure.java.io/resource "dimacs.bnf") :auto-whitespace :standard))
...
(time (parser (slurp filename)))

and it's taking about a hundred seconds. That's three orders of magnitude slower than I was hoping for. Is there some way to speed it up, some way to tweak the grammar or some option I'm missing?

rwallace
  • 31,405
  • 40
  • 123
  • 242

2 Answers2

3

The grammar is wrong. It can't be satisfied.

  • Every file ends with a clause.
  • Every clause ends with a '0'.
  • The literal in the clause, being a greedy reg-exp,will eat the final '0'.

Conclusion: No clause will ever be found.

For example ...

=> (parser "60")
Parse error at line 1, column 3:
60
  ^
Expected one of:
"0"
#"\s+"
#"-\d+"
#"[1-9]\d*"

We can parse a literal

=> (parser "60" :start :literal)
("60")

... but not a clause

=> (parser "60" :start :clause)
Parse error at line 1, column 3:
60
  ^
Expected one of:
"0" (followed by end-of-string)
#"\s+"
#"-\d+"
#"[1-9]\d*"

Why is it so slow?

If there is a comment:

  • it can swallow the whole file;
  • or be broken at any 'c' character into successive comments;
  • or terminate at any point after the initial 'c'.

This implies that every tail has to be presented to the rest of the grammar, which includes a reg-exp for literal that Instaparse can't see inside. Hence all have to be tried, and all will ultimately fail. No wonder it's slow.


I suspect that this file is actually divided into lines. And that your problems arise from trying to conflate newlines with other forms of white-space.

May I gently point out that playing with a few tiny examples - which is all I've done - might have saved you a deal of trouble.

Thumbnail
  • 13,293
  • 2
  • 29
  • 37
0

I think that your extensive use of * is causing the problem. Your grammar is too ambiguous/ambitious (I guess). I would check two things:

;;run it as
 (insta/parses grammar input)
;; with a small input

That will show you how much ambiguity is in your grammar definition: check "ambiguous grammar".

Read Engelberg performance notes, it would help understand your own problem and probably find out what fits best for you.

carocad
  • 455
  • 6
  • 12