0

I am new to JFlex and CUP. I am trying to do a simple example, but when I run the parser, it always gives the same error, it does not progress with the recognition of the statements. I think the problem must be in the productions or rules. I have defined as terminals the symbols that are always used in Java, for example:

terminal LPAREN, RPAREN, RBRACE, LBRACE, LBRCKT, RBRCKT, COLON, SEMICOLON, ASSIGN, COMMA, DOT;
terminal PAGE, LABEL;

I define non-terminals as follows:

nonterminal START_PAGE;
nonterminal page_body, page_body_declarations_opt, page_body_declarations, page_body_declaration;

The grammar like this:

start with START_PAGE;
START_PAGE ::= PAGE page_body ;

page_body ::= LBRACE page_body_declarations_opt RBRACE ;
page_body_declarations_opt ::= | page_body_declarations ;
page_body_declaration ;
page_body_declarations ::= page_body_declarations page_body_declaration ;
page_body_declaration ::= label_declaration ;
label_declaration ::= LABEL LPAREN RPAREN SEMICOLON ;

The input data or file contains the following from line 2:


Page {
}

When I run the parser, I print the result of the lexer, then I run the parser getting the following result:

Token: # 2 Page
Token: # 51 {
Token: # 52}
Token: # 0
Compiler has detected a syntax error at line 2 column 8
Error in line 2, column 8: Couldn't repair and continue parse

The error is presented in the LBRACE.

The version I am using for testing is:

<jflex.version>1.8.2</jflex.version>
<cup.version>11b-20160615</cup.version>

The general idea is:

Define the page block and internally define other elements such as the label.

I would be grateful if you would give me a hand to be able to carry out the example.

The information that exists is very scarce for the definition of rules or productions and how the reductions in CUP should be made. All the examples that I have investigated are arithmetic expressions but I have not found more that can contribute to the resolution of the problem.

This is my lexer file


// USER CODE

package core;

import java.io.Reader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java_cup.runtime.Symbol;
import java_cup.runtime.Scanner;
import java.nio.charset.StandardCharsets;
import java_cup.runtime.ComplexSymbolFactory;
import java_cup.runtime.ComplexSymbolFactory.Location;
import java_cup.runtime.ComplexSymbolFactory.ComplexSymbol;


/**
 * Lexer class.
 */

%% /*----------------------------------------------------------*/

%public
// Lexer class to generate
%class Lexer
%cupsym MSymbol
%function next_token
%implements MSymbol, Scanner
%type java_cup.runtime.Symbol

%unicode

%cupdebug

%char
%full

%line
%column

%eofval{
    
    return mSymbol(MSymbol.EOF);

%eofval}


/*--------------------------------------------------------------
    CODE COPIED INTO LEXER
  --------------------------------------------------------------*/
%{ 

    ComplexSymbolFactory symbolFactory;
    
    StringBuffer string = new StringBuffer();

    public Lexer(Reader in, ComplexSymbolFactory sf){
        this(in);
        symbolFactory = sf;
    }
    
    private Symbol mSymbol(int type) {
        return new Symbol(type, yyline, yycolumn);
    }
    
    private Symbol mSymbol(int type, Object value) {
        return new Symbol(type, yyline, yycolumn, value);
    }
    private void error(String message) {
        System.out.println("Error at line "+(yyline+1)+", column "+(yycolumn+1)+" : " + message);
    }

%}

/*--------------------------------------------------------------
    MACRO DECLARATIONS
  --------------------------------------------------------------*/
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
//WhiteSpace     = {LineTerminator} | [\ ,\t,\f]
WhiteSpace     = [\ ,\t,\f,\t] | {LineTerminator}

/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}

TraditionalComment   = "/*" [^*] ~"*/" | "/*" "*"+ "/"
// Comment can be the last line of the file, without line terminator.
EndOfLineComment     = "//" {InputCharacter}* {LineTerminator}?
DocumentationComment = "/**" {CommentContent} "*"+ "/"
CommentContent       = ( [^*] | \*+ [^/*] )*


%state STRING

%% /*----------------------------------------------------------*/

/* Keywords */

<YYINITIAL> {

/*-------------------------------------------------------------
    KEYWORDS
  -------------------------------------------------------------*/
    "Page"      { 
                    return mSymbol(MSymbol.PAGE, yytext()); 
                }
    
}   //------> End of Keywords


<YYINITIAL> {
    /* separators */
    "("             { return mSymbol(MSymbol.LPAREN, yytext()); }
    ")"             { return mSymbol(MSymbol.RPAREN, yytext()); }
    "{"             { return mSymbol(MSymbol.RBRACE, yytext()); }
    "}"             { return mSymbol(MSymbol.LBRACE, yytext()); }
    "["             { return mSymbol(MSymbol.LBRCKT, yytext()); }
    "]"             { return mSymbol(MSymbol.RBRCKT, yytext()); }
    ";"             { return mSymbol(MSymbol.SEMICOLON, yytext()); }
    ","             { return mSymbol(MSymbol.COMMA, yytext()); }
    "."             { return mSymbol(MSymbol.DOT, yytext()); }
    "="             { return mSymbol(MSymbol.ASSIGN, yytext()); }
    ":"             { return mSymbol(MSymbol.COLON, yytext()); }

    \"              { yybegin(STRING); string.setLength(0); }
    

    /* WHITESPACE */
    {WhiteSpace}    { /* ignore */ }

    /* comments */
    {Comment}       { /* ignore */ }
}

<STRING> {
    \"              {   
                        yybegin(YYINITIAL);
                        return mSymbol(MSymbol.STRING_LITERAL, string.toString()); 
                    }
    
}

/* error fallback */
[^]                 { 
                        this.error("Illegal character [ " + yytext() + " ]");
                    }

When run the parser, the lexer detect tokens. I don't know what happens with space.

Token: #2 Page
Token: #51 {
Token: #52 }
Token: #0 
Compiler has detected a syntax error at line 2 column 8
Error in line 2, column 8 : Couldn't repair and continue parse

The text Couldn't repair and continue parse is set by parser cup.

This is the result when generating parser

------- CUP v0.11b 20160615 (GIT 4ac7450) Parser Generation Summary -------
  0 errors and 54 warnings
  62 terminals, 7 non-terminals, and 8 productions declared, 
  producing 15 unique parse states.
  54 terminals declared but not used.
  0 non-terminals declared but not used.
  0 productions never reduced.
  0 conflicts detected (0 expected).
  Code written to "Parser.java", and "MSymbol.java".
---------------------------------------------------- (CUP v0.11b 20160615 (GIT 4ac7450))

Code to run parser

    ComplexSymbolFactory csf = new ComplexSymbolFactory();
    Lexer lexer = new Lexer(new BufferedReader(new FileReader(args[0], StandardCharsets.UTF_8)), csf);
    ScannerBuffer lxrBuff = new ScannerBuffer(lexer);
    Parser mParser = new Parser(lxrBuff, csf);
    mParser.parse();

Thanks in advance

Marco Osorio
  • 101
  • 8
  • There's not enough information in your post for someone to answer. What does the lexer definition look like? Does the lexer definition handle whitespace correctly? – Jim Garrison Jan 07 '22 at 21:07
  • Doesn't CUP give you some kind of warning or error message when you generate the parser? – rici Jan 07 '22 at 21:10
  • Or maybe there is a copy&paste error in your code. Either way, an edit is needed. – rici Jan 08 '22 at 02:46
  • Hi @Jim-Garrison, thanks for the quick reply. The lexer is similar to the JFlex repo example https://github.com/jflex-de/jflex/blob/master/jflex/examples/cup-java-minijava/src/main/jflex/minijava.flex I think the macro ```WhiteSpace = {LineTerminator} | [ \t \f]``` is not working properly as I did a test removing the space before \ t and it gave me another error. – Marco Osorio Jan 08 '22 at 15:21
  • Hello, I have found an error in the separation symbols, specifically in the RBRACE AND LBRACE, they were inverted, but it is not the problem since when correcting it, the error remains the same, that is, in the symbol '}'. Any ideas ?, import the order of appearance of symbols / tokens when they are declared or WhiteSpace macros – Marco Osorio Jan 09 '22 at 19:52

1 Answers1

0

I have already found the solution to the problem after correcting the LBRACE and RBRACE tokens. Being using the latest version of JFlex that integrates CUP, it uses a new ComplexSymbolFactory on CUP implementation and had to use this method to store the tokens.

private Symbol mSymbol (String tokenName, int type, Object val) {
    Location left = new Location (yyline + 1, yycolumn + 1);
    Location right = new Location (yyline + 1, yycolumn + yylength ());

    return symbolFactory.newSymbol (tokenName, type, left, right, val);
}

Thanks

Marco Osorio
  • 101
  • 8