1

I'm trying to write a source to source translator using libTooling.

I'm using ASTMatchers to try to find if statements that don't have curly braces and then use a rewriter to add the braces.

The matcher I'm using is:

ifStmt(unless(hasDescendant(compoundStmt())))

Then I just get the start and end locations, and rewrite the curly braces.

Here's the source code for that:

if (const IfStmt *IfS = Result.Nodes.getNodeAs<clang::IfStmt>("ifStmt")) {
const Stmt *Then = IfS->getThen();
Rewrite.InsertText(Then->getLocStart(), "{", true, true);
Rewrite.InsertText(Then->getLocEnd(),"}",true,true);

Now the problem is that for some reason the end location is always off by 2 characters. Why is this so?

Scott McPeak
  • 8,803
  • 2
  • 40
  • 79
Farzad Sadeghi
  • 105
  • 2
  • 10
  • I tried using InsertTextAfterToken instead of InserText. It always missed the semicolon so now it's off by only one. – Farzad Sadeghi May 19 '16 at 10:59
  • There is discussion of this problem on the LLVM Discourse at https://discourse.llvm.org/t/extend-stmt-with-proper-end-location/54745 – Scott McPeak Aug 24 '23 at 17:31

2 Answers2

2

the SourceLocation i was getting is off by one because it only matches the token and ";" is not part of that. btw, if anybody's wondering how to include the ";" into the range if they want to, you could just use Lexer::MeasureTokenLength and then add that by one and get the new SourceLocaiton by offset.

Farzad Sadeghi
  • 105
  • 2
  • 10
  • This answer assumes that the semicolon follows immediately after the preceding token, with no intervening whitespace, which is not true in general. – Scott McPeak Aug 24 '23 at 17:33
0

This is a general issue with the Clang AST: it usually does not record the location of the final semicolon of a statement that ends in one. See discussion Extend Stmt with proper end location? on the LLVM Discourse server.

To solve this problem, the usual approach is to start with the end location as stored in the AST, then use the Lexer class to advance forward until the semicolon is found. This is not 100% reliable because there can be intervening macros and preprocessing directives, but fortunately that is uncommon for the final semicolon of a statement.

There is an example of doing this in clang::arcmt::trans::findSemiAfterLocation in the Clang source code. The essence is these lines:

  // Lex from the start of the given location.
  Lexer lexer(SM.getLocForStartOfFile(locInfo.first),
              Ctx.getLangOpts(),
              file.begin(), tokenBegin, file.end());
  Token tok;
  lexer.LexFromRawLexer(tok);
  if (tok.isNot(tok::semi)) {
    if (!IsDecl)
      return SourceLocation();
    // Declaration may be followed with other tokens; such as an __attribute,
    // before ending with a semicolon.
    return findSemiAfterLocation(tok.getLocation(), Ctx, /*IsDecl*/true);
  }
Scott McPeak
  • 8,803
  • 2
  • 40
  • 79