9

I'm newbie to flex. I'm trying to write a simple re-entrant lexer/scanner with flex. The lexer definition goes below. I get stuck with compilation errors as shown below (yyg issue):

reentrant.l:

/* Definitions */

digit           [0-9]
letter          [a-zA-Z]
alphanum        [a-zA-Z0-9]
identifier      [a-zA-Z_][a-zA-Z0-9_]+
integer         [0-9]+
natural         [0-9]*[1-9][0-9]*
decimal         ([0-9]+\.|\.[0-9]+|[0-9]+\.[0-9]+)

%{
    #include <stdio.h>

    #define ECHO fwrite(yytext, yyleng, 1, yyout)

    int totalNums = 0;
%}

%option reentrant
%option prefix="simpleit_"

%%

^(.*)\r?\n     printf("%d\t%s", yylineno++, yytext);

%%
/* Routines */

int yywrap(yyscan_t yyscanner)
{
    return 1;
}

int main(int argc, char* argv[])
{
    yyscan_t yyscanner;

    if(argc < 2) {
        printf("Usage: %s fileName\n", argv[0]);
        return -1;
    }

    yyin = fopen(argv[1], "rb");

    yylex(yyscanner);

    return 0;
}

Compilation errors:

vietlq@mylappie:~/Desktop/parsers/reentrant$ gcc lex.simpleit_.c 
reentrant.l: In function ‘main’:
reentrant.l:44: error: ‘yyg’ undeclared (first use in this function)
reentrant.l:44: error: (Each undeclared identifier is reported only once
reentrant.l:44: error: for each function it appears in.)
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Viet
  • 17,944
  • 33
  • 103
  • 135

1 Answers1

13

For a reentrant lexer, all communication must include the state, which is contained within the scanner.

Anywhere in your program (e.g. inside main) you can access the state variables via special functions to which you will pass your scanner. E.g., in your original reentrant.l, you can do this:

yyscan_t scanner;
yylex_init(&scanner);
yyset_in(fopen(argv[1], "rb"), scanner);
yylex(scanner);
yylex_destroy(scanner);

I have renamed scanner to avoid confusion with yyscanner in the actions. In contrast with general C code, all your actions occur within a giant function called yylex, which is passed your scanner by the name yyscanner. Thus, yyscanner is available to all your actions. In addition, yylex has a local variable called yyg that holds the entire state, and most macros conveniently refer to yyg.

While it is true that you can use the yyin macro inside main by defining yyg as you did in your own Answer, that is not recommended. For a reentrant lexer, the macros are meant for actions only.

To see how this is implemented, you can always view the generated code:


/* For convenience, these vars
   are macros in the reentrant scanner. */
#define yyin yyg->yyin_r
...

/* Holds the entire state of the reentrant scanner. */
struct yyguts_t
...

#define YY_DECL int yylex (yyscan_t yyscanner)

/** The main scanner function which does all the work.
 */
YY_DECL
{
    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
...
}

There is lots more on the reentrant option in the flex docs, which include a cleanly compiling example. (Google "flex reentrant", and look for the flex.sourceforge link.) Unlike bison, flex has a fairly straight-forward model for reentrancy. I strongly suggest using reentrant flex with Lemon Parser, rather than with yacc/bison.

cdunn2001
  • 17,657
  • 8
  • 55
  • 45
  • Anything besides the above initialization that is needed? I get a crash in the internals, it appears that yy_cp is a null pointer after assignment to yyg->yy_c_buf_p. – Michael Sep 18 '12 at 04:50
  • It's hard to debug through comments. What I posted worked for me inside `main()` in *reentrant.l* from the OP. – cdunn2001 Sep 22 '12 at 00:54
  • 1
    I know this is really old - however I have been attempting to use lex. I have found that if I use BEGIN or any of the start condition macros, they still get the yyg undeclared message. I was, like you, able to get this simplistic main() only lexer working. I will attempt to define yyg in my function to see if that fixes it for my other functions that attempt to change stat conditions. – Derek Dec 19 '12 at 16:22