5

I have a templated class SafeInt<T> (By Microsoft).

This class in theory can be used in place of a POD integer type and can detect any integer overflows during arithmetic operations.

For this class I wrote some custom templatized overloaded arithmetic operator (+, -, *, /) functions whose both arguments are objects of SafeInt<T>.

I typedef'd all my integer types to SafeInt class type.

I want to search my codebase for instances of the said binary operators where both operands are of type SafeInt.

Some of the ways I could think of

  1. String search using regex and weed through the code to detect operator usage instances where both operands are SafeInt objects.

  2. Write a clang tool and process the AST to do this searching (I am yet to learn how to write such a tool.)

  3. Somehow add a counter to count the number of times the custom overloaded operator is instantiated. I spent a lot of time trying this but doesn't seem to work.

Can anyone suggest a better way?

Please let me know if I need to clarify anything.

Thanks.

0x97c8
  • 51
  • 1
  • 1
    1. regex: no way no way no way. 2. clang: most robust solution, but you would have to justify the time and energy to learn and build such a tool, which brings me to my question: why do you need this for? – bolov Apr 30 '16 at 08:30
  • 3
    (temporarily) define them as deleted and see where the compiler complains? – T.C. Apr 30 '16 at 08:30
  • @T.C. genius! Pure genius! – bolov Apr 30 '16 at 08:31
  • @bolov So SafeInt has these operators defined as it's class members. But there is no definition for operators where both operands are SafeInt. Which makes sense. For Ex: If i have two operands of type SafeInt and SafeInt there is no right way to determine the output type. If the types are corresponding native types, integer promotion rules would give a result of unsigned type. But depending on the context that could mean an overflown value or not. So I defined the missing operators and want to estimate how many places in the code base do I have to analyze the context – 0x97c8 Apr 30 '16 at 08:36
  • @TC Once I delete the overload, I would probably not get all the instances (at once)where the operator is not defined. I have to verify this though. – 0x97c8 Apr 30 '16 at 08:40
  • @0x97c8 I think you are going about it the wrong way. It really doesn't matter how many calls to these operators are written in the program. What actually matters is how many times they are called during runtime. E.g. you could have 1000 calls written and at runtime they could be called only 500 times in total, or you could have 10 calls written, but in a hot spot so they are called millions of time. – bolov Apr 30 '16 at 08:40
  • So what you really need is a profiler. Fortunately, Visual Studio has a very very good one. – bolov Apr 30 '16 at 08:41
  • @bovlov I agree regarding number of calls and number of invocations comment. But each call is used in a particular context where based on the context I can change the operands to use an overload which is defined in the class. But basing my changes on number of invocations is not possible since i cannot cover all inputs to test all invocations. – 0x97c8 Apr 30 '16 at 08:48
  • @0x97c8 I misunderstood you, sorry. Well... I think I get now what you are trying to do, but still think you are going the wrong way. When you design an interface (in your case your operators) you have to have in mind from the start how it is going to be used. You don't need to see actual calls to say, hey we need to add any int type with any int type, or we need to restrict to only adding the same int type. – bolov Apr 30 '16 at 08:52
  • @bovlov I agree with the philosophy. But i am trying to port an existing codebase to use a library provided by Microsoft. SafeInt<> is publicly released by MSFT. Given that the library might not be usable out of the box. To achieve the benefits of the library the codebase has to be refractored. And that is what I am trying to do. – 0x97c8 Apr 30 '16 at 08:56
  • BTW I didn't know what an XY problem was. So that was a nice 'TIL' :) – 0x97c8 Apr 30 '16 at 08:57
  • So what you really want to know, is for any overloaded operator, precisely where that operator has been used? – Ira Baxter May 10 '16 at 07:56
  • @Ira Yes. That's what i am looking for – 0x97c8 May 10 '16 at 13:39

2 Answers2

2

Short answer

You can do this using the clang-query command:

$ clang-query \
  -c='m cxxOperatorCallExpr(callee(functionDecl(hasName("operator+"))), hasArgument(0, expr(hasType(cxxRecordDecl(hasName("SafeInt"))))), hasArgument(1, expr(hasType(cxxRecordDecl(hasName("SafeInt"))))))' \
  use-si.cc --

Match #1:

/home/scott/wrk/learn/clang/clang-query1/use-si.cc:10:3: note: "root" binds here
  x + y;           // reported
  ^~~~~
1 match.

What is clang-query?

clang-query is a utility intended to facilitate writing clang-tidy checks. In particular it understands the language of AST Matchers and can be used to interactively explore what is matched by a given match expression. However, as shown here, it can also be used non-interactively to look for arbitrary AST tree patterns.

The blog post Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query by Stephen Kelly provides a nice introduction to using clang-query.

The clang-query program is included in the pre-built LLVM binaries, or it can be built from source as described in the AST Matchers Tutorial.

How does the above command work?

The -c argument provides a command to run non-interactively. With whitespace added, the command is:

m                                  // Match (and report) every
cxxOperatorCallExpr(               // operator function call
  callee(functionDecl(             // where the callee
    hasName("operator+"))),        // is "operator+", and
  hasArgument(0,                   // where the first argument
    expr(hasType(cxxRecordDecl(    // is a class type
      hasName("SafeInt"))))),      // called "SafeInt",
  hasArgument(1,                   // and the second argument
    expr(hasType(cxxRecordDecl(    // is also a class type
      hasName("SafeInt"))))))      // called "SafeInt".

The command line ends with use-si.cc --, meaning to analyze use-si.cc and there are no extra compiler flags needed by clang to interpret it.

The clang-query command line has the same basic structure as that of clang-tidy, including the ability to pass -p compile_commands.json to scan many files at once, possibly with different compiler options per file.

Example input

For completeness, the input I used to test my matcher is use-si.cc:

// use-si.cc

#include "SafeInt.hpp"         // SafeInt

void f1()
{
  SafeInt<int> x(2);
  SafeInt<int> y(3);

  x + y;           // reported

  x + 2;           // not reported

  2 + x;           // not reported
}

where SafeInt.hpp comes from https://github.com/dcleblanc/SafeInt , the repo named on the Microsoft SafeInt page.

Scott McPeak
  • 8,803
  • 2
  • 40
  • 79
0

To do this right, you clearly have to be able to identify individual uses of the operator which overload to a specific operator definition. Fundamentally, you need what the front end of a C++ compiler does: parsing and name resolution (including the overloads).

Obviously GCC and Clang have this basic capability. But you want to track/display all uses of the specific operator. You can probably bend Clang (or GCC, harder) to give you this information on a file-by-file basis.

Our DMS Software Reengineering Toolkit with its C++ Front End can be used for this, too. DMS provides the generic parsing and symbol table support machinery; the C++ front end specializes DMS to handle C++ with full, accurate name resolution including overloads, for both GCC5 and MSVS2015. Its symbol table actually collects, for each declaration in a scope, the point of the declaration, and the list of uses of that declaration in terms of accurate source positions. The symbol scopes include an entry for each (overloaded) operator valid in the scope. You could just go to the desired symbol table entry and enumerate/count the list of references to get a raw count. There are standard APIs for this available via DMS.

The same kind of symbol scope/definition/uses information is used by our Java Source Browser to build an HTML-based JavaDoc-like display with full HTML linkages between symbol declarations and uses. So for any symbol declaration, you can easily see the uses.

The C++ front end has a similar HTMLizer that operates on C++ source code. It isn't as mature/pretty, but it is robust. It presently doesn't show all the uses of a declared symbol, but that would be a pretty straightforward change to make to it. (I don't have a publicly visible instance of it. Contact me through my bio and I can send you a sample).

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341