I recently worked on similar research, met the exact same confusion, and resolved it. Regardless if you are still working on this topic, here's my answer.
Your understanding is correct:
- Packrat parser scans input string from left to right
- Packrat parser construct the cache from right to left
But there's just one approach, not two. Let's use one simple example Parsing Expression Grammar (PEG) without left-recursion: E -> Num + E | Num
(Note that, a left-recursion example requires another long explanation, you can refer CPython's implementation for details)
The Syntax Directed Translation (SDT) will be something like:
E -> a=Num + b=E { a + b }
E -> Num { Num }
And we can write a parse_E
function in below:
def parse_E(idx):
if idx in cache['parse_E']:
return cache['parse_E'][idx]
lval, nidx = parse_Char(idx)
if nidx < len(self.tokens):
operator, nnidx = parse_Char(nidx)
if operator == '+':
# E -> Num + E
rval, nnnidx = parse_E(nnidx)
cache['parse_E'][idx] = lval + rval, nnnidx
return cache['parse_E'][idx]
# E -> Num
cache['parse_E'][idx] = lval, nidx
return cache['parse_E'][idx]
According to Byran Ford's paper, the parser needs to scan the input string from left to right and construct the cache in any position:
for idx in len(input_string):
parse_E(idx)
parse_Char(idx)
So, let's check the cache construction under the hood, initially, we have an empty cache and input string:
cache: {'parse_E': {}, 'parse_Char': {}}
input string: `2 + 3 + 4`
The function call happens in the following order when idx=0
. Clearly, we construct the cache from right to left at position 0 (not even to mention idx=1
or above).
parse_Char(Y)
happens earlier than parse_Char(X)
(Y > X)
parse_Char(X)
must happens earlier than parse_E(X)
parse_E(0) --- (E -> Num + E) (pending)
-> parse_Char(0) --- 2 (pending)
-> parse_Char(1) --- + (pending)
-> parse_E(2) --- E (E -> Num + E) (pending)
-> parse_Char(2) --- 3 (pending)
-> parse_Char(3) --- + (pending)
-> parse_E(4) --- E (E -> Num) (pending)
-> parse_Char(4) --- 4 (acc)
# Only after parse_Char(4) succeed and fill into cache, parse_E(4) can be successful...and so on.
If you want to read the full Python example of Packrat parser implementation, you can check my repository. It contains a handmade Packrat parser and a CPython PEG generated Packrat parser based on a simple meta grammar.