Ilustrate the left-most derivation on a token stream

Question

I am trying to understand the left-most derivation in the context of LL parsing algorithm. This link explains it from the generative perspective. i.e. It shows how to follow left-most derivation to generate a specific token sequence from a set of rules.

But I am thinking about the opposite direction. Given a token stream and a set of grammar rules, how to find the proper steps to apply a set of rules by the left-most derivation?

Let's continue to use the following grammar from the aforementioned link:

And the given token sequence is: 1 2 3

One way is this:

1 2 3
-> D D D 
-> N D D (rewrite the *left-most* D to N according to the rule N->D.)
-> N D (rewrite the *left-most* N D to N according to the rule N->N D.)
-> N  (same as above.)

But there are other ways to apply the grammar rules:

1 2 3 -> D D D -> N D D -> N N D -> N N N

OR

1 2 3 -> D D D -> N D D -> N N D -> N N

But only the first derivation ends up in a single non-terminal.

As the token sequence length increase, there can be many more ways. I think to infer a proper deriving steps, 2 prerequisites are needed:

a starting/root rule
the token sequence

After giving these 2, what's the algorithm to find the deriving steps? Do we have to make the final result a single non-terminal?

Perhaps my edit makes my answer clearer. If not, ask :) – rici Jan 09 '17 at 00:02 — rici, Jan 09 '17 at 00:02

rici · Accepted Answer · 2017-01-08T23:59:59.790

The general process of LL parsing consists of repeatedly:

Predict the production for the top grammar symbol on the stack, if that symbol is a non-terminal, and replace that symbol with the right-hand side of the production.
Match the top grammar symbol on the stack with the next input symbol, discarding both of them.

The match action is unproblematic but the prediction might require an oracle. However, for the purposes of this explanation, the mechanism by which the prediction is made is irrelevant, provided that it works. For example, it might be that for some small integer k, every possible sequence of k input symbols is only consistent with at most one possible production, in which case you could use a look-up table. In that case, we say that the grammar is LL(k). But you could use any mechanism, including magic. It is only necessary that the prediction always be accurate.

At any step in this algorithm, the partially-derived string is the consumed input appended with the stack. Initially there is no consumed input and the stack consists solely of the start symbol, so that the the partially-derived string (which has had 0 derivations applied). Since the consumed input consists solely of terminals and the algorithm only ever modifies the top (first) element of the stack, it is clear that the series of partially-derived strings constitutes a leftmost derivation.

If the parse is successful, the entire input will be consumed and the stack will be empty, so the parse results in a leftmost derivation of the input from the start symbol.

Here's the complete parse for your example:

Consumed           Unconsumed   Partial      Production
Input      Stack   input        derivation   or other action
--------   -----   ----------   ----------   --------------- 
           N       1 2 3        N            N → N D
           N D     1 2 3        N D          N → N D
           N D D   1 2 3        N D D        N → D
           D D D   1 2 3        D D D        D → 1
           1 D D   1 2 3        1 D D        -- match --
1          D D       2 3        1 D D        D → 2
1          2 D       2 3        1 2 D        -- match --
1 2        D           3        1 2 D        D → 3
1 2        3           3        1 2 3        -- match --
1 2 3      --         --        1 2 3        -- success --

If you read the last two columns, you can see the derivation process starting from N and ending with 1 2 3. In this example, the prediction can only be made using magic because the rule N → N D is not LL(k) for any k; using the right-recursive rule N → D N instead would allow an LL(2) decision procedure (for example,"use N → D N if there are at least two unconsumed input tokens; otherwise N → D".)

The chart you are trying to produce, which starts with 1 2 3 and ends with N is a bottom-up parse. Bottom-up parses using the LR algorithm correspond to rightmost derivations, but the derivation needs to be read backwards, since it ends with the start symbol.

Thanks. So with an assistive stack, and a proper way to make prediction decision, the parsing can be carried on. But I am not quite clear about this sentence *because the rule `N → N D` is not `LL(k)` for any k; using the right-recursive rule `N → D N` instead would allow an `LL(2)` decision procedure.*. In my opinion, with the rule `N → N D`, if there are at least two unconsumed input tokens, I still have to apply `N → N D`. I don't see other choice here. — smwikipedia, Jan 09 '17 at 02:24
@smwikipedia: replacing `N` with `N D` never consumes an input token. You have to do it exactly as many times as you have inputs, but since it doesn't consume an input you can't know how many times without examining the entire input stream, an unbounded lookahead. By contrast, after you replace `N` with `D N`, the next two steps must consume the input token corresponding to the `D`, allowing the parse to continue. So the stack always consists of just `D N` and it is only when the next token is the last one left that you need to do something different. — rici, Jan 09 '17 at 02:37
Thanks again. So it seems LR has some advantage over LL. I think it's because of the *human's instinct* to process input from left to right. The left-most non-terminal looks like a *blockade* that prevents the input from being consumed unless the whole input can be matched. While the LR exhibits *match-able* terminals ASAP and thus can process the input progressively. But is it possible that LR make some unwise matching? If that happens, will it roll back? — smwikipedia, Jan 09 '17 at 02:47
@smwikipedia: I definitely think LR is a superior algorithm. Indeed, I don't think there is any doubt about that because *every* LL(k) language can be described by an LR(1) grammar, but there are many LR(1) languages which have no LL(k) grammar. Since both algorithms are linear time and constant space (and the constants are comparable), there's no efficiency argument, either. But that's not exactly the same thing as you are asking... — rici, Jan 09 '17 at 02:51
... the key, as I mentioned, is in how decisions are made. The basic left-to-right parsing algorithms, top-down and bottom-up, are frameworks which can work with any accurate decision algorithm. LL and LR are specific instances of those respective frameworks in which the decision is made by consulting a lookup table using the next `k` unconsumed input items for some fixed `k`. As it happens, the LR algorithm will work with both left- and right-recursive rules, but the LL algorithm will not work with left-recursive rules, as illustrated by this trivial example. But you can usually convert... — rici, Jan 09 '17 at 02:55
... left-recursion to right-recursion, albeit at the cost of modifying the structure of the resulting parse tree. LR(k) parsing is not universal; there are many grammars (and also many languages) which cannot be parsed in this way, and there are different algorithms which can cope with these grammars. However, the different algorithms are no longer linear-time. One possibility is, indeed, to backtrack when a problem occurs, but backtracking is exponential-time in general. Another approach is "General" LR parsing, which can be done in cubic time. There are lots of links in wikipedia. — rici, Jan 09 '17 at 02:59
... The third last comment should have said "linear time and space", since a linear-sized stack is involved. It's only constant space for LR grammars if the only recursion in the grammar is right-recursion. — rici, Jan 09 '17 at 03:00

Ilustrate the left-most derivation on a token stream

1 Answers1