2

I'm currently trying to familiarize myself with packrat parsing. So I've read the PDF paper from 2002 linked here and in section 2.3 it describes packrat caching as a preliminary process (which occurs before the actual parsing) in which a full caching table is pre-constructed by reading the input from right to left. Only then, the actual linear parsing from left to right can start.

But in every PEG parser implementation I found, the "cache" option is usually a caching process that occurs during the actual left to right parsing. For example here.

Is there any difference between both approaches? Thank you.

ostrebler
  • 940
  • 9
  • 32

1 Answers1

2

I recently worked on similar research, met the exact same confusion, and resolved it. Regardless if you are still working on this topic, here's my answer.

Your understanding is correct:

  • Packrat parser scans input string from left to right
  • Packrat parser construct the cache from right to left

But there's just one approach, not two. Let's use one simple example Parsing Expression Grammar (PEG) without left-recursion: E -> Num + E | Num

(Note that, a left-recursion example requires another long explanation, you can refer CPython's implementation for details)

The Syntax Directed Translation (SDT) will be something like:

E -> a=Num + b=E { a + b }
E -> Num { Num }

And we can write a parse_E function in below:

def parse_E(idx):
    if idx in cache['parse_E']:
        return cache['parse_E'][idx]

    lval, nidx = parse_Char(idx)
    if nidx < len(self.tokens):
        operator, nnidx = parse_Char(nidx)
        if operator == '+':
            # E -> Num + E
            rval, nnnidx = parse_E(nnidx)
            cache['parse_E'][idx] = lval + rval, nnnidx
            return cache['parse_E'][idx]
    
    # E -> Num
    cache['parse_E'][idx] = lval, nidx
    return cache['parse_E'][idx]

According to Byran Ford's paper, the parser needs to scan the input string from left to right and construct the cache in any position:

for idx in len(input_string):
    parse_E(idx)
    parse_Char(idx)

So, let's check the cache construction under the hood, initially, we have an empty cache and input string:

cache: {'parse_E': {}, 'parse_Char': {}}
input string: `2 + 3 + 4`

The function call happens in the following order when idx=0. Clearly, we construct the cache from right to left at position 0 (not even to mention idx=1 or above).

  • parse_Char(Y) happens earlier than parse_Char(X) (Y > X)
  • parse_Char(X) must happens earlier than parse_E(X)
   parse_E(0)     ---   (E -> Num + E) (pending)
-> parse_Char(0)  --- 2 (pending)
-> parse_Char(1)  --- + (pending)
-> parse_E(2)     --- E (E -> Num + E) (pending)
-> parse_Char(2)  --- 3 (pending)       
-> parse_Char(3)  --- + (pending)
-> parse_E(4)     --- E (E -> Num) (pending)
-> parse_Char(4)  --- 4 (acc)

# Only after parse_Char(4) succeed and fill into cache, parse_E(4) can be successful...and so on.

If you want to read the full Python example of Packrat parser implementation, you can check my repository. It contains a handmade Packrat parser and a CPython PEG generated Packrat parser based on a simple meta grammar.

Kir Chou
  • 2,980
  • 1
  • 36
  • 48
  • 1
    Haha it's been a long time, and when I gave it some more thought, I actually understood the mechanism you describe. I forgot to go back here, but you did it. Thank you! – ostrebler Feb 28 '21 at 11:06