For a class on compilers I am building a lexer. I have completed the assignment, but am left with one point that I am not fully satisfied with.
The language supports string literals with escape sequences, where a string literal is defined as a sequence of characters enclosed by double quotes ("
) and an escape sequence starts with a backslash (\
). The lexer is supposed to produce a token for string literals with the escape sequences already processed (such as replacing \n
with a newline character and \t
with a tab).
My question is, is it possible to recognize such string literals (and process the escape sequences contained in them) without copying the parts matched so far to a temporary buffer? And if it is possible, how to do that.