Normalizing structural differences in grammars

Question

Consider the following grammar:

S → A | B
A → xy
B → xyz

This is what I think an LR(0) parser would do given the input xyz:

    | xyz   → shift
  x | yz    → shift
 xy | z     → reduce
  A | z     → shift
 Az |       → fail

If my assumption is correct and we changed rule B to read:

B → Az

now the grammar suddenly becomes acceptable by an LR(0) parser. I presume this new grammar describes the exact same set of strings than the first grammar in this question.

What are the differences between the first and second grammars?
How do we decouple structural differences in our grammars from the language they describe?
- Through normalization?
- What kind of normalization?

To further clarify:

I want to describe a language to a parser, without the structure of the grammar playing a role. I'd like to obtain the most minimal/fundamental description of a set of strings. For LR(k) grammars, I'd like to minimize the k.

I think `LR(0)` parser with `B->Az` will also fail for `xyz` ;). — shA.t, May 31 '15 at 15:29

score 3 · Answer 1 · edited May 23 '17 at 12:06

I think your LR(0) parser is not a standard parser:

Source

An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).

So, When you have:

S->A|B
A->xy
B->Az

or

S->A|B
A->xy
B->xyz

LR(0) will never check B rule, And for both of them it will fail.

State0 - Clousure(S->°A):
   S->°A
   A->°xy

 Arcs:
  0 --> x --> 2
  0 --> A --> 1
-------------------------

State1 - Goto(State0,A):
   S->A°

 Arcs:
  1 --> $ --> Accept
-------------------------

State2 - Goto(State0,x):
   A->x°y

 Arcs:
  2 --> y --> 3
-------------------------

State3 - Goto(State2,y):
   A->xy°

 Arcs:
-------------------------

But if you have

I->S
S->A|B
A->xy
B->xyz                           or B->Az

Both of them will accept the xyz, but in difference states:

State0 - Clousure(I->°S):
   I->°S
   S->°A
   S->°B
   A->°xy                           A->°xy, $z
   B->°xyz                          B->°Az, $

 Arcs:
  0 --> x --> 4
  0 --> S --> 1
  0 --> A --> 2
  0 --> B --> 3
-------------------------

State1 - Goto(State0,S):
   I->S°

 Arcs:
  1 --> $ --> Accept
-------------------------

State2 - Goto(State0,A):
   S->A°                            S->A°, $ 
                                    B->A°z, $
 Arcs:                              2 --> z --> 5
-------------------------

State3 - Goto(State0,B):
   S->B°

 Arcs:
-------------------------

State4 - Goto(State0,x):
   A->x°y                            A->x°y, $z
   B->x°yz

 Arcs:
  4 --> y --> 5                     4 --> y --> 6
-------------------------

State5 - Goto(State4,y):            - Goto(State2,z):
   A->xy°                           B->Az°, $

 Arcs:
  5 --> z --> 6                     -<None>-
-------------------------

State6 - Goto(State5,z):            - Goto(State4,y)
   B->xyz°                          A->xy°, $z

 Arcs:
-------------------------

You can see the Goto Table and Action Table is different.

[B->xyz]                                  [B->Az]
  | Stack   | Input  | Action               | Stack   | Input  | Action
--+---------+--------+----------          --+---------+--------+----------
1 | 0       | xyz$   | Shift              1 | 0       | xyz$   | Shift 
2 | 0 4     | yz$    | Shift              2 | 0 4     | xy$    | Shift
3 | 0 4 5   | z$     | Shift              3 | 0 4 6   | z$     | Reduce A->xy
4 | 0 4 5 6 | $      | Reduce B->xyz      4 | 0 2     | z$     | Shift 
5 | 0 3     | $      | Reduce S->B        5 | 0 2 5   | $      | Reduce B->Az
6 | 0 1     | $      | Accept             6 | 0 3     | $      | Reduce S->B
                                          7 | 0 1     | $      | Accept

Simply when you change B->xyz to B->Az you add an Action to your LR Table to find the differences you can check Action Table and Goto Table (Constructing LR(0) parsing tables)

When you have A->xy and B->xyz then you have two bottom handles [xy or xyz] but when you have B->Az you have only one bottom handle [xy] that can accept an additional z.
I think related to local optimization -c=a+b; d=a+b -> c=a+b; d=c- when you use B->Az you make B->xyz optimized.

You might be right about the whether this langauge is LR(0) or not. I think you missed the point of his question. — Ira Baxter, May 31 '15 at 18:04
@shA.t Thanks for taking the time to answer. You provided two grammars. The question is why is one parseable by LR(0) and the other not, even though both grammars describe the same language. — thwd, May 31 '15 at 22:55
@thwd: Is that *really* the point of your question? The trivial answer is, "because one grammar is LR(0) [meaning literally that an LR(0) engine can parse with it] and the other is not", but I don't see how that gives you any useful information. I thought you were asking a more sophisticated question about, "how can I tell that two grammars (regardless of what parsing engine it takes to process them) represent the same language" (I don't think you can in general, maybe there are specific cases) and how one might normalize grammars to maximize the odds of seeing similarity or identity. — Ira Baxter, Jun 01 '15 at 05:38
@IraBaxter maybe I oversimplified. The deeper question is: if I can write two grammars for the same language and one is parseable in LR(0) whereas the other is LR(1) then the latter grammatical structure for the language is "needlessly complex". How can I describe a language to a parser without the grammar's structure having any influence? In other words, how do I arrive at the most fundamental definition of a language as a grammar? — thwd, Jun 01 '15 at 07:15
@thwd: well, I really like the direction of the question, a notion of grammar complexity or some kind of "minimal grammar". I don't know the answer, I here pushing because I like the questions. The question about "parser not influenced by grammar": you give a conventional grammar to a parser capable of doing context-free parsing, then it can parse. Earley and GLR parsers both have this ability (I have lots of experience with GLR parsers on this, see my bio). But that doesn't go to the heart of your question, I don't think. — Ira Baxter, Jun 01 '15 at 08:34

Normalizing structural differences in grammars

1 Answers1