0

I am writing a lexical analyzer using JFlex. When the word co is matched, we have to ignore what comes after until the end of the line (because it's a comment). For the moment, I have a boolean variable that changes to true whenever this word is matched and if an identifier or an operator is matched after co until the end of the line, I simply ignore it because I have an if condition in my Identifier and Operator token identification.
I am wondering if there is better way to do this and get rid of this if statement that appears everywhere?

Here is the code:

%% // Options of the scanner

%class Lexer     
%unicode        
%line      
%column      
%standalone 

%{
    private boolean isCommentOpen = false;
    private void toggleIsCommentOpen() {
        this.isCommentOpen = ! this.isCommentOpen;
    }
    private boolean getIsCommentOpen() {
        return this.isCommentOpen;
    }
%} 

Operators           = [\+\-]
Identifier          = [A-Z]*

EndOfLine           = \r|\n|\r\n

%%
{Operators}         {
                        if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
                            // Do Code
                        }
                    }  
 
{Identifier}        {
                        if (! getIsBlockCommentOpen() && ! getIsCommentOpen()) {
                            // Do Code
                        }
                    }

"co"                {
                        toggleIsCommentOpen();
                    }

.                   {}

{EndOfLine}         {
                        if (getIsCommentOpen()) {
                            toggleIsCommentOpen();
                        }
                    }
mrjamaisvu
  • 137
  • 9

1 Answers1

1

One way to do this is to use states in JFlex. We say that every time the word co is matched, we enter in a state named COMMENT_STATE and we do nothing until the end of the line. After the end of the line, we exit the COMMENT_STATE state. So here is the code:

%% // Options of the scanner

%class Lexer     
%unicode        
%line      
%column      
%standalone  

Operators           = [\+\-]
Identifier          = [A-Z]*

EndOfLine           = \r|\n|\r\n

%xstate YYINITIAL, COMMENT_STATE

%%
<YYINITIAL> {
    "co" {yybegin(COMMENT_STATE);}
}

<COMMENT_STATE> {
    {EndOfLine} {yybegin(YYINITIAL);}
    .           {}
}

{Operators} {// Do Code}  
 
{Identifier} {// Do Code} 

. {}

{EndOfLine} {}

With this new approach, the lexer is more simpler and it's also more readable.

mrjamaisvu
  • 137
  • 9