5

I am trying to analyze a large C++ program. The program heavily uses STL container data structures like set, map, unordered set, unordered map, vector etc. Sometimes they are nested, e.g. map of sets.

I want to find out, in a particular run of the program, which containers hold the largest number of elements (i.e. largest value of size()). I can do minor edits to the program.

If there was a way to iterate over all the containers, or if there was a way to intercept the (size modifying) APIs of the containers, that might have been helpful. But these are not possible.

How would you approach this?

Addition: The platform in Linux, compiler is either g++ or clang++.

Arun
  • 19,750
  • 10
  • 51
  • 60
  • 5
    I would use a memory monitoring tool such as ValGrind or Purify. – SergeyA Feb 16 '16 at 15:28
  • 1
    What platform are you on? I'd approach this by using a tool that hooks the global memory allocator. Visual Studio 2015 has powerful built in tools for analyzing allocations, tools exist for other platforms too. – mattnewport Feb 16 '16 at 16:57
  • @SergeyA: Can you please mention how Valgrind may be used here? – Arun Feb 16 '16 at 18:09
  • @mattnewport: Its Linux. – Arun Feb 16 '16 at 18:09
  • How large is your program? Millions of lines or only a few thousands? Can you spend weeks on your issue? (In that case customizing your compiler with [MELT](http://gcc-melt.org/) might be worthwhile) – Basile Starynkevitch Feb 16 '16 at 18:26

3 Answers3

2

This method is useful when your project is really big and have very much instances of different containers. Advantage of method is that you does not need to modify big amount of code. It allows you to narrow type of container to find. This method help to diagnose situatation per contaier and per type.

It is possible to redefine template< class T > struct allocator. Possible to rename original allocator in std headers or modify it. Make it possible to do statistics for allocation and deallocation. You will know count and size per type of elements. But You can not know which instance of container that have elements.

Template template< class T > struct allocator placed at library header files. It is always exists and does not need to rebuild your development environment library, becouse as far as you know that template is not possible to compile into static library (exclude specialisation). Templates compiled always with your sources. But may be problem with precompiled headers. For project it is possible to regenerate or not use it, but for library it is need to check. Possible this is bottleneck of method, but it is simple to verify exists problem or not.

There is one empirical method that is not guarantee accuracy. When Your application is shutdown, the containers deallocated after its elements deallocated. So you can write statistics per container of parent type how much internal elements was at which type of container.

For example let we have:

vector<A>({1,2,3}) and map<string,B>({1,2}) and map<string,B>({1,2})

This will generate deallocation event list like this:

B, B, map<string,B>,
A, A, map<string,A>,
A, A, A, vector<A>,

So you can know that 3 elements A at vector<A>, 2 elements A at map<string,A>, and 2 elements A at map<string,A>

oklas
  • 7,935
  • 2
  • 26
  • 42
  • Changing std headers isn't kosher. – BitWhistler Feb 17 '16 at 08:10
  • 1
    Kosher. When It help solve enough difficult problem. And it is only in development environment. It is kosher in production too when it help solve problem and rollback will be done as soon as possible. If changing std is not kosher than develop std lib and IDE and OS is not kosher too. – oklas Feb 17 '16 at 08:29
2

If you can do minor edits, can you add every container to a big list of them?
Like this:

std::set<......> my_set;  // existing code
all_containers.add( &my_set ); // minor edit IMHO

Then you can call all_containers.analyse() that'd call size() on each of them and print the results.

You can use something like that:

struct ContainerStatsI {
  virtual int getSize() = 0;
};
template<class T> struct ContainerStats : ContainerStatsI {
  T* p_;
  ContainerStats( T* p ) : p_(p) {}
  int getSize() { return p->size(); }
};
struct ContainerStatsList {
  std::list<ContainerStatsI*> list_; // or some other container....
  template<class T> void add( T* p ) {
    list_.push_back( new ContainerStats<T>(p) );
  }
  // you should probably add a remove(T* p) as well
  void analyse() {
    for( ContainerStatsI* p : list_ ) {
      p->getSize(); // do something with what's returned here
    }
  }
};
BitWhistler
  • 1,439
  • 8
  • 12
  • It is possible to use macroses, which pass to all_containers file and line where container is actually created. – oklas Feb 16 '16 at 16:34
  • You can do it with a macro for file:line or you can add some string name, or some other form of id, in the add method. – BitWhistler Feb 16 '16 at 18:11
  • Thanks for the idea. Adding `ContainerStats*` classes is very possible but adding the call to `add()` for each container is formidable. How do we address nested containers, e.g. a map from an int to a set? – Arun Feb 16 '16 at 19:17
  • You do not need to add more classes or `add` functions as it's templatized. You can call `add` on nested containers like on any other. The real challenge in your problem is generating container names you can actually identify later. This challenge does not go away with any solution. – BitWhistler Feb 16 '16 at 21:02
  • Container names can be generated by using macros like `__FILE__, __LINE__, __COUNTER__`. They may be folded into a macro wrapper, e.g. `ADD(arg)` which calls `add(arg)`. To me, the real challenge is calling `add()` on all the containers, there are *thousands* of them :) – Arun Feb 16 '16 at 21:15
  • file and line is just a point in the code. It can be creating many instances of the same thing.... – BitWhistler Feb 16 '16 at 21:26
  • So now, I guess you'll have to automate the calls to `add`. I'd do a find and replace in all files to change `map` to `my_map` and such. And then you can `add` and `remove` in ctor/dtor. Not the nicest thing to do, but easy to change to alias later – BitWhistler Feb 16 '16 at 21:33
1

Add statistics code to destructors of containers in std header files. This does not require to modify big amount of code of big project too. But this show only container type too (see my another answer here). Method does not require C++0x or C++11 or any more.

First and mandatory step is to add your std libary under source control, git for example, for quick view what is actually changed and for quick switch between modified and original version.

Place this declaration of Stat class into std library sources folder:

class Stat {
    std::map<std::string,int> total;
    std::map<std::string,int> maximum;
public:
    template<class T>
    int log( std::string cont, size_t size ) {
        std::string key = cont + ": " + typeid(T).name();
        if( maximum[key] < size ) maximum[key] = size;
        total[key] += size;
    }
    void show_result() {
        std::cout << "container type total maximum" << std::endl;
        std::map<std::string,int>::const_iterator it;
        for( it = total.begin(); it != total.end(); ++it ) {
            std::cout << it->first << " " << it->second
               << " " << maximum[it->first] << std::endl;
        }
    }
    static Stat& instance();
    ~Stat(){ show_result(); }
};

Instantiate instance of singleton of Stat class at your project cpp file:

Stat& Stat::instance() {
    static Stat stat;
    return stat;
}

Edit the std library container templates. Add statistic loggering at destructors.

// modify this standart template library sources:

template< T, Allocator = std::allocator<T> > vector {
    ...
    virtual ~vector() {
        Stat::instance().log<value_type>( "std::vector", this->size() );
    }
};

template< Key, T, Compare = std::less<Key>,
    Allocator = std::allocator<std::pair<const Key, T> > map {
    ...
    virtual ~map(){
        Stat::instance().log<value_type>( "std::map", this->size() );
    }
};

Consider a program for example now:

int main() {
    {
        // reject to use C++0x, project does not need such dependency
        std_vector<int> v1; for(int i=0;i<10;++i) v1.push_back( i );
        std_vector<int> v2; for(int i=0;i<10;++i) v2.push_back( i );
        std_map<int,std::string> m1; for(int i=0;i<10;++i) m1[i]="";
        std_map<int,std::string> m2; for(int i=0;i<20;++i) m2[i]="";
    }
    Stat::instance().show_result();
    return 0;
}

The result for gcc is:

container type total maximum
std::map: St4pairIiSsE 30 20
std::vector: i 20 10

If you need more detailed type desription than find information concerning your development environment. Such conversion described here for gcc: https://lists.gnu.org/archive/html/help-gplusplus/2009-02/msg00006.html

Output may be like this:

container type total maximum
std::map: std::pair<int, std::string> 30 20
std::vector: int 20 10
oklas
  • 7,935
  • 2
  • 26
  • 42
  • Maybe possible but the dirtiest approach possible. – BitWhistler Feb 17 '16 at 08:15
  • 1
    Realy durty is speak so without arguments. – oklas Feb 17 '16 at 08:34
  • Pay attention to your static data. It may be destroyed possible after Stat instance destroyed. Such data may be like "not visible" for stat sometimes. – oklas Feb 18 '16 at 09:00
  • My apologies @oklas. You can change std headers all you like. I would not go this way because I only make changes I can keep and deploy, and this is not a change I'd keep as gcc/libc versions change, various boxes have different directory layout, not to mention different OS, etc – BitWhistler Feb 19 '16 at 16:43
  • Do not confuse concrete problem of project, and std lib distribution. There is no any unreal to servers deploy anyway. If you can not go this way - do not go it. – oklas Feb 19 '16 at 18:30