I hope this is not seen as an ignorant question, but I am a compiler noob. I could not find any answer to this question by myself, but that might be because I lack the proper search terms.
I am trying to automatically find a pre-processor directive in source code and change it to something else. This is a thing that appears in 100s of files in a project I have to work on and doing it by hand seems tedious (i.e search and replace) as the same directive has to stay in other files.
I thought doing some libclang magic with the clang pythonn package could make it a bit more fun and I've gotten a nice solution out of it that works for me. However, something is puzzling me. I had a problem where the new text would be inserted at the wrong spot, i.e. a couple characters off.
A bit of context: I am looping through all tokens in the AST of a source file:
import clang.cindex
index = clang.cindex.Index.create()
translation_unit = index.parse(filename, args)
for tok in translation_unit.get_tokens():
# this is where my code goes
and if a token of type TokenKind.PUNCTUATION
appears that has the string literal "#", I check if the following two tokens are the define I am looking for.
I then take the current token's extent.begin_int_data
field and the last tokens extent.end_int_data
field and replace the string. This gave me somehting like
#d<my replaced string>
instead of <my replaced string>
.
In the end I ended up subtracting the extent.begin_int_data
of the very first token from those numbers to achieve the correct offset. But I am wondering why the first token's extent.begin_int_data
is not 0. In my case the number is 2. I've tried putting other keywords at the beginning of the file, but the number stays the same, so it seems to be independent of the starting token.
Is there a reason for this? Maybe it has something to do with the file encoding? I am curious!
In the original clang source code the source range is only mentioned as the character range, so from that description I would expect it to start at 0, just as fseek would.