1

I have a 26 rule grammar for a sub-grammar of Mini Java. This grammar is supposed to be non-object-oriented. Anyway, I've been trying to left-factor it and remove left-recursion. However I test it with JFLAP, though, it tells me it is not LL(1). I have followed every step of the algorithm in the Aho-Sethi book.

Could you please give me some tips?

Goal ::= MainClass $
MainClass ::= class <IDENTIFIER> { MethodDeclarations public static void main ( ) {
    VarDeclarations Statements } }
    VarDeclarations ::= VarDeclaration VarDeclarations | e
VarDeclaration ::= Type <IDENTIFIER> ;
MethodDeclarations ::= MethodDeclaration MethodDeclarations | e
MethodDeclaration ::= public static Type <IDENTIFIER> ( Parameters ) {
    VarDeclarations Statements return GenExpression ; }
Parameters ::= Type <IDENTIFIER> Parameter | e
Parameter ::= , Type <IDENTIFIER> Parameter | e
Type ::= boolean | int
Statements ::= Statement Statements | e
Statement ::= { Statements }
        |   if ( GenExpression ) Statement else Statement
        |   while ( GenExpression ) Statement
        |   System.out.println ( GenExpression ) ;
        |   <IDENTIFIER> = GenExpression ;
GenExpression ::= Expression | RelExpression
Expression ::= Term ExpressionRest
ExpressionRest ::= e | + Term ExpressionRest | - Term ExpressionRest
Term ::= Factor TermRest
TermRest ::= e | * Factor TermRest
Factor ::= ( Expression )
        |   true
        |   false
        |   <INTEGER-LITERAL>
        |   <IDENTIFIER> ArgumentList
ArgumentList ::= e | ( Arguments )
RelExpression ::= RelTerm RelExpressionRest
RelExpressionRest ::= e | && RelTerm RelExpressionEnd
RelExpressionEnd ::= e | RelExpressionRest
RelTerm ::= Term RelTermRest
RelTermRest ::= == Expression | < Expression | ExpressionRest RelTermEnding
RelTermEnding ::= == Expression | < Expression
Arguments ::= Expression Argument | RelExpression Argument | e
Argument ::= , GenExpression Argument | e 

Each <IDENTIFIER> is a valid Java identifier, and <INTEGER-LITERAL> is a simple integer. Each e production stands for an epsilon production, and the $ in the first rule is the end-of-file marker.

Milad Naseri
  • 4,053
  • 1
  • 27
  • 39

2 Answers2

2

I think I spotted two problems (there might be more):

Problem #1

In MainClass you have

MethodDeclarations public static void main

And a MethodDeclaration is

public static Type | e

That's not LL(1) since when the parser sees "public" it cannot tell if it's a MethodDeclaration or the "public static void main" method.

Problem #2

Arguments ::= Expression Argument | RelExpression Argument | e

Both Expression:

Expression ::= Term ExpressionRest

... and RelExpression:

RelExpression ::= RelTerm RelExpressionRest
RelTerm ::= Term RelTermRest

... start with "Term" so that's not LL(1) either.

I'd just go for LL(k) or LL(*) because they allow you to write much more maintainable grammars.

stmax
  • 6,506
  • 4
  • 28
  • 45
  • Thanks. Well, that's two, and I guess there are more. Isn't there a methodical way I can check for these loopholes? – Milad Naseri Jun 30 '12 at 22:35
  • 1
    The quickest and most reliable way probably is to let your parser generator do the LL(1) condition checks. Checking it yourself basically requires you to figure out all the terminal-symbols each rule can start with. You know it's not LL(1) if some conditions of a rule start with the same terminal-symbols. That's pretty much the same your parser generator does and what I did when looking at your grammar. You'll get a good feel for it after working with grammars for a while, but to be sure, ask the parser generator :) – stmax Jun 30 '12 at 23:04
  • 1
    The Wikipedia entry on [Constructing an LL(1) parsing table](http://en.wikipedia.org/wiki/LL_parser#Constructing_an_LL.281.29_parsing_table) provides a detailed description of this method. The last sentence is important: "If the table contains at most one rule in every one of its cells, then the parser will always know which rule it has to use and can therefore parse strings without backtracking. **It is in precisely this case that the grammar is called an LL(1) grammar.**" – stmax Jun 30 '12 at 23:11
  • Thanks. I will do this. I know that there is no precise 'answer' to this sort of question, therefore I'm gonna accept yours, and do more work on it. – Milad Naseri Jul 01 '12 at 00:16
  • 1
    In the end, I used ANTLRWorks to fix the grammar and generate a proper parser, though the result was not LL(1), rather LL(*). – Milad Naseri Jul 15 '12 at 06:17
0

Is there anything to prevent IDENTIFIER being the same as one of your reserved words? if not then your grammar would be ambiguous. I don't see anything else though.

If all else fails, I'd remove all but the last line of the grammar, and test that. If that passes I'd add each line one at a time until I found the problem line.

ams
  • 24,923
  • 4
  • 54
  • 75
  • IDENTIFIER is guaranteed to not be a keyword. Consider it to be {all words not starting with a number} - {Java keywords}. – Milad Naseri Jun 30 '12 at 21:38