I am trying to figure out some details involving parsing expression grammars, and am stuck on the following question:
For the given grammar:
a = b Z
b = Z Z | Z
(where lower-case letters indicate productions, and uppercase letters indicate terminals). Is the production "a" supposed to match against the string "Z Z"?
Here is the pseudo-code that I've seen the above grammar get translated to, where each production is mapped to a function that outputs two values. The first indicates whether the parse succeeded. And the second indicates the resulting position in the stream after the parse.
defn parse-a (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b(i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b1 (i:Int) -> [True|False, Int] :
val [r1, i1] = eat("Z", i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b2 (i:Int) -> [True|False, Int] :
eat("Z", i)
defn parse-b (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b1(i)
if r1 : [r1, i1]
else : parse-b2(i)
The above code will fail when trying to parse the production "a" on the input "Z Z". This is because the parsing function for "b" is incorrect. It will greedily consume both Z's in the input and succeed, and then leave nothing left for a to parse. Is this what a parsing expression grammar is supposed to do? The pseudocode in Ford's thesis seems to indicate this.
Thanks very much.
-Patrick