yyinput
uses up buffer space in the buffer, although it doesn't let you recover the data read from yytext
. About the only reason for this behaviour that I've ever come up with is that it allows you to unput()
as many of the characters as you input()
without destroying yytext
, which is useful if you're using input()
as a way of peeking at the next input.
For whatever the reason, that means that you cannot use yyinput
to avoid buffer reallocation. So you need to do the next best thing: handle long tokens in smaller pieces. For example, you could do something like this:
%%
/* Variable is local to a call to yylex */
std::string longtoken;
<STATE>{identifier} {
/* Personally I'd prefer to use a regex pattern than an if here */
if (is_long_prefix(yytext)) {
longtoken.clear();
BEGIN(STATE_LONG_IDENTIFIER);
}
else {
yylval.str = strdup(yytext);
return IDENTIFIER;
}
// ...
}
<STATE_LONG_IDENTIFIER>{
/* Here we handle subtokens of up to 100 characters. The number
* is arbitrary, but the nature of flex is that the resulting DFA
* will have one state per repetition, and large repetitions create
* a lot of states.
*/
.{1,100} { longtoken.append(yytext, yyleng); }
\n { yylval.str = strdup(longtoken.c_str(););
BEGIN(STATE);
return IDENTIFIER;
}
<<EOF>> { error("Unterminated long identifier"); }
}