I am trying to preprocess my C++ source files by ANTLR. I would like to output an input file preserving all the whitespace formatting of the original source file while inserting some new source codes of my own at the appropriate locations.
I know preserving WS requires this lexer rule:
WS: (' '|'\n'| '\r'|'\t'|'\f' )+ {$channel=HIDDEN;};
With this my parser rules would have a $text attribute containing all the hidden WS. But the problem is, for any parser rule, its $text attribute only include those input text starting from the position that matches the first token of the rule. For example, if this is my input (note the formatting WS before and in between the tokens):
line 1; line 2;
And, if I have 2 separate parser rules matching
"line 1;"
and
"line 2;"
above separately but not the whole line:
" line 1; line 2;"
, then the leading WS and those WS in between "line 1"
and "line 2"
are lost (not accessible by any of my rules).
What should I do to preserve ALL THE WHITESPACEs while allowing my parser rules to determine when to add new codes at the appropriate locations?
EDIT
Let's say whenever my code contains a call to function(1) using 1 as the parameter but not something else, it adds an extraFunction() before it:
void myFunction() {
function();
function(1);
}
Becomes:
void myFunction() {
function();
extraFunction();
function(1);
}
This preprocessed output should remain human readable as people would continue coding on it. For this simple example, text editor can handle it. But there are more complicated cases that justify the use of ANTLR.