2

I'm trying to wrap my head around parser theory, and I keep finding the same example in different sources. The grammar goes approximately like this (simplified):

E = T
E = E + T
T = 0..9

So supposedly a string 2 + 2 will be parsed as such ("|" separates the stack from the reminder)

|2 + 2 <-can't reduce, shift
2|+ 2  <-reduce by T = 0..9
T|+ 2  <-reduce by E = T
E|+ 2  <-can't reduce, shift
E +|2  <-can't reduce, shift
E + 2| <-reduce by T = 0..9
E + T| <-reduction by E = E + T here?
E|     <-done

The question is, at E + T step parser can apply two different reductions to the rightmost part of the stack: E = T (resulting in E + E) and E = E + T (resulting in E). And I can't find a clear and conscise explanation how it chooses one over the other.

What am I missing?

Vindicar
  • 354
  • 2
  • 9

2 Answers2

2

What are the possible states?

0: Beginning
1: Just shifted 0..9 after State 0, recognize a T
2: Reduce State 1 to an E.
3: Just shifted + after State 2 or 5, looking for T
4: Just shifted 0..9 after State 3, recognize a T giving us E + T.
5: Reduce state 4 to an E
6: Reach the end of the stack after state 2 or 5.

So we start in state 0. Shift a 2. We are now in state 1. Transition to state 2. Shift a +. We are now in state 3. We shift a 2. We are in state 4. We reduce to state 5. We reach the end of the stack and wind up with an expression tree looking like the following:

  E
  |
E + T
|   |
T   2
|
2
btilly
  • 43,296
  • 3
  • 59
  • 88
  • Ah, so parser can be in different states despite having just applied the same reduction, depending on the history of reductions? Like automaton can reach different states on the same input symbol, depending on which state was current before? – Vindicar Nov 28 '18 at 08:27
  • @Vindicar Yes. Exactly like an automaton. – btilly Nov 28 '18 at 16:19
  • @Vindicar Yes. You can think if it as exactly like an automaton which leaves and occasionally cleans up a breadcrumb trail as it goes. That breadcrumb trail turns into the parse tree. – btilly Nov 28 '18 at 18:02
1

According to the grammar, an E can never follow a +. This rules out the E = T production at this state.

To fully understand that, construct the parser tables by hand - the example is small enough to make this feasible.

Henry
  • 42,982
  • 7
  • 68
  • 84
  • See, that's the thing - the parser would need to know that E = T leads to a dead end a step later. How would it know that without trying? It can look ahead in the input string, but how can it look ahead in its own parsing tree? – Vindicar Nov 27 '18 at 10:34
  • The parser, when running has not only the symbols on the stack but also the parser state. The state is different in the situation where reduction with this production is allowed and where it is not. You should really try to construct the states by hand. – Henry Nov 27 '18 at 10:38
  • I will need to understand what a parser state is, first... still, thanks! – Vindicar Nov 27 '18 at 11:22