Here's a possible definition of YY_INPUT
using getline()
. It should work as long as no token includes both a newline character and the following character. (A token could include a newline character at the end.) Specifically, current_line
will contain the last line of the current token.
On successful completion of the lexical scan, current_line
will be freed and the remaining global variables reset so that another input can be lexically analysed. If the lexical scan is discontinued before end of input is reached (for example, because the parse was unsuccessful), an explicit call should be made to reset_current_line()
in order to perform these tasks.
char* current_line = NULL;
size_t current_line_alloc = 0;
ssize_t current_line_sent = 0;
ssize_t current_line_len = 0;
void reset_current_line() {
free(current_line);
current_line = NULL;
current_line_alloc = current_line_sent = current_line_len = 0;
}
ssize_t refill_flex_buffer(char* buf, size_t max_size) {
ssize_t avail = current_line_len - current_line_sent;
if (!avail) {
current_line_sent = 0;
avail = getline(¤t_line, ¤t_line_alloc, stdin);
if (avail < 0) {
if (ferror(stdin)) { perror("Could not read input: "); }
avail = 0;
}
current_line_len = avail;
}
if (avail > max_size) avail = max_size;
memcpy(buf, current_line + current_line_sent, avail);
current_line_sent += avail;
if (!avail) reset_current_line();
return avail;
}
#define YY_INPUT(buf, result, max_size) \
result = refill_flex_buffer(buf, max_size);
Although the above code does not depend on maintaining the current column position, it is important if you want to identify where the current token is in the current line. The following will help provided you don't use yyless
or yymore
:
size_t current_col = 0, current_col_end = 0;
/* Call this in any token whose last character is \n,
* but only after making use of column information.
*/
void reset_current_col() {
current_col = current_col_end = 0;
}
#define YY_USER_ACTION \
{ current_col = current_col_end; current_col_end += yyleng; }
If you are using this scanner with a parser with lookahead, it may not be sufficient to keep only one line of the input stream, since the lookahead token may be on a subsequent line to the error token. Keeping several retained lines in a circular buffer would be a simple enhancement, but it is not at all obvious how many lines are necessary.