0

Page 78 of the Flex user's manual says:

There is no way to write a rule which is "match this text, but only if it comes at the end of the file". You can fake it, though, if you happen to have a character lying around that you don't allow in your input. Then you can redefine YY_INPUT to call your own routine which, if it sees an EOF, returns the magic character first (and remembers to return a real EOF next time it's called.

I am trying to implement that approach. In fact I managed to get it working (see below). For this input:

Hello, world
How are you?@

I get this (correct) output:

Here's some text Hello, world
Saw this string at EOF How are you?

But I had to do two things in my implementation to get it to work; two things that I shouldn't have to do:

I had to call yyterminate(). If I don't call yyterminate() then the output is this:

Here's some text Hello, world
Saw this string at EOF How are you?
Saw this string at EOF

I shouldn't be getting that last line. Why am I getting that last line?

I don't understand why I had to do this: tmp[yyleng-1] = '\0'; (subtract 1). I should be able to do this: tmp[yyleng] = '\0'; (not subtract 1) Why do I need to subtract 1?

%option noyywrap
%{
int sawEOF = 0;
#define YY_INPUT(buf,result,max_size) \
{ \
   if (sawEOF == 1) \
      result = YY_NULL; \
   else { \
      int c = fgetc(yyin); \
      if (c == EOF) { \
         sawEOF = 1; \
         buf[0] = '@'; \
         result = 1; \
      } \
      else { \
         buf[0] = c; \
         result = 1; \
      } \
   } \
}
%}
EOF_CHAR @
%% 
[^\n@]*{EOF_CHAR}    { char *tmp = strdup(yytext); 
                       tmp[yyleng-1] = '\0'; 
                       printf("Saw this string at EOF %s\n", tmp); 
                       yyterminate();
                     }
[^\n@]+              { printf("Here's some text %s\n", yytext); }
\n                   { }
%%
int main(int argc, char *argv[])
{ 
      yyin = fopen(argv[1], "r");
      yylex();
      fclose(yyin);
      return 0;
}
Roger Costello
  • 3,007
  • 1
  • 22
  • 43
  • 1
    The point of `tmp[yyleng-1] = '\0';` is to overwrite the `@` with a 0. `tmp[yyleng]` is already '\0', because Flex always null-terminates `yytext`. Another alternative: replace `strdup` with `strndup`: `char* tmp = strndup(yytext, yyleng-1);` – rici May 27 '22 at 21:36
  • The reason you get two EOF reports is painfully obvious, assuming you wrote that code. Explain to your rubber duck what your `YY_INPUT` does, and reflect about what "don't allow in your input" means and why your input doesn't obey that requirement. – rici May 27 '22 at 21:42
  • Also possibly of interest: https://stackoverflow.com/q/72240697/1566221 – rici May 27 '22 at 22:01
  • @rici Sorry, I am still not seeing why I need yyterminate() – Roger Costello May 28 '22 at 12:34
  • 1
    You don't need yyterminate. You need to not put an `@` in your input. It's not allowed, as the FAQ says The `@` in your input triggers the action --a false positive-- and then the `@` you fabricate in `YY_INPUT` triggers it again. – rici May 28 '22 at 14:46
  • 1
    Despite being in the Flex FAQ, I don't think it's a very good suggestion. At a minimum, your `YY_INPUT` should verify that it's not receiving an `@` from the input, and produce an error if it does (although there's no mechanism for `YY_INPUT` to report errors, which is a limitation.) Also, reading input a byte at a time makes lexical analysis quadratic in token length instead of linear. That's maybe acceptable for interactive input with short tokens but it shouldn't be employed in production code, particularly if exposed to arbitrary user input. – rici May 28 '22 at 14:54

0 Answers0