1

(This is quite a large question about software design. In case it's not suited for StackOverflow I'm willing to copy it to the Software-Engineering community)

I'm working with heap_stat, a script, which investigates dumps. This script is based on the idea that, for any object which has a virtual function, the vftable field is always the first one (allowing to find the memory address of the class of the object).

In my applications there are some objects, having vftable entries (typically every STL object has it), but there are also quite some objects who don't.

In order to force the presence of a vftable field, I've done following test:

Create a nonsense class, having a virtual function, and let my class inherit from this nonsense class:

class NONSENSE {
    virtual int nonsense() { return 0; }
};

class Own_Class : public NONSENSE, ...

This, as expected, created a vftable entry in the symbols, which I could find (using Windbg's x /2 *!Own_Class*vftable* command):

00000000`012da1e0 Own_Application!Own_Class::`vftable'

I also saw a difference in memory usage:

sizeof(an normal Own_Class object) = 2928
sizeof(inherited Own_Class object) = 2936

=> 8 bytes have been added for this object.

There's a catch: apparently quite some objects are defined as:

class ATL_NO_VTABLE Own_Class

This ATL_NO_VTABLE blocks the creation of the vftable entry, which means the following (ATL_NO_VTABLE equals __declspec(novtable)):

// __declspec(novtable) is used on a class declaration to prevent the vtable
// pointer from being initialized in the constructor and destructor for the
// class.  This has many benefits because the linker can now eliminate the
// vtable and all the functions pointed to by the vtable.  Also, the actual
// constructor and destructor code are now smaller.

In my opinion, this means that the vftable does not get created, because of which object methods get called more directly, having an impact on the speed of the method execution and stack handling. Allowing the vftable to be created has following impact:

Not to be taken into account:

  • There is one more call on the stack, this only has impact in case of systems which are already at the limit of their memory usage. (I have no idea how the linker points to a particular method)
  • The CPU usage increase will be too small to be seen.
  • The speed decrease will be too small to be seen.

To be taken into account:

  • As mentioned before, the memory usage of the application increases by 8 bytes per object. When a regular object has a size of some 1000 bytes, this means a memory usage increase of ±1%, but for objects with a memory size of less than 80 bytes, this might cause a memory usage increase of +10%.

Now I have following questions:

  1. Is my analysis on the impact correct?
  2. Is there a better way to force the creation of the vftable field, having less impact?
  3. Did I miss anything?

Thanks in advance

Dominique
  • 16,450
  • 15
  • 56
  • 112
  • "_every STL object has it_" what do you mean? – curiousguy Feb 04 '19 at 02:53
  • @curiousguy: I mean that, in my application, there are some `CMap`, `CArray`, ..., objects, and launching `x /2 *!CArray*vftable*` is giving results. – Dominique Feb 04 '19 at 07:07
  • STL means "Standard Template Library". It's a framework made by HP and taken over by SGI, with classes template acting as sequences, associative containers, iterators... Are you classes designed to function like STL containers? – curiousguy Feb 04 '19 at 07:19
  • @curiousguy: No: the only thing about STL containers is that they have virtual functions, I'd like my classes also to have those, and I'd like to know the impact of that. – Dominique Feb 04 '19 at 07:59
  • Which containers are you talking about, exactly? STL containers usually don't have any virtual function and have a lot of inline functions. – curiousguy Feb 04 '19 at 09:57
  • I'm using `CMap`, `CMapPtrToString` and similar, `CArray`, `CArray` of `CMap`, ... – Dominique Feb 04 '19 at 10:04
  • 1
    IMHO, this could be an XY problem. You want a vftable because you want to use heap_stat. That's ok so far. But why do you want to use heap_stat? What's the root cause that you want to solve? Memory leaks? If so, it may happen that heap_stat is the wrong approach. The thing is (AFAI understood): heap_stat works on a single crash dump. To identify memory leaks, you'd better have two or more crash dumps and compare them against each other. There are much better tools doing that. – Thomas Weller Feb 04 '19 at 10:29
  • @ThomasWeller: you might be right, but heap_stat is the tool we're currently using in our company (and indeed, we're comparing the results of two heap_stat analyses). If, however, you know about better tools for analysing memory leaks, I'm always open for new ideas. – Dominique Feb 04 '19 at 10:36
  • 1
    So you don't use any commercial tool, i.e. C++ memory profiler? What about UMDH? My basic rule is: don't change the code for debugging reasons. Like we no longer insert printf-statements into our code. It's too easy to forget. And once your memory leak is solved, you would like to undo all these changes. What's your tool chain? – Thomas Weller Feb 04 '19 at 10:44
  • @ThomasWeller: we're not using C++ memory profiler or UMDH: our situation is: we develop software, which is very complex, and it gets installed at customers' systems. The customer might observe memory leaks, takes dumps and sends them to us. We can''t intervene on customers' system (like setting GFlags, ...), hence UMDH is not an option. We also don't have the possibility to do remote debugging. As far as we know, heap_stat is the only tool where you can do a memory leak analysis, starting from memory dumps. – Dominique Feb 04 '19 at 11:02
  • "Customer observes memory leaks" is a point where a lot of things can go wrong. How does he identify the leak? Using task manager? Using Process Explorer? If he has enough experience to choose Process Explorer, he still does not see the value of GFlags and others? If he uses Task Manager, you likely don't have a memory leak. – Thomas Weller Feb 08 '19 at 09:39
  • @ThomasWeller: in my company we have developed a fiable tool for checking memory usage and detecting memory leaks. – Dominique Feb 08 '19 at 13:26
  • @Dominique, I am also interested in such a tool, but indeed having to modify the source code just to be able to force dump symbols is not ideal. Specially for existing large code bases. – Juan Gonzalez Burgos Apr 19 '21 at 17:33

2 Answers2

1

Is my analysis on the impact correct?

No. __declspec(novtable) omits generation of vtable itself for a given class, the pointer to vtable would still exist, so sizeof will not change.

__declspec(novtable) is meant to be used for base classes, that have derived classes. So that constructor of derived class will set vtable pointer to derived vtable, and base vtable is not needed.

So, this optimization eliminates one pointer assignment (in generated part of constructor code), and a bit of space for vtable itself. Not very much useful for your goal to have per-object optimization, as it only does small per-class optimization.

It will work if you don't create base instances on their own, and don't call virtual method in constructor/destructor.

Omission of virtual function calls by making them non-virtual is completely separate story. It is called devirtualization. When compiler can be sure instance of which class is used, it replaces virtual calls with non-virtual ones.

__declspec(novtable) cannot help devirtualization anyhow. final / sealed keywords may help devirtualization, as they say there's no further derived classes/methods.

Regarding assumption that vtable pointer is the first member, this may be wrong. vtable pointer will be not first if your base classes don't have vtable, but have some data member. Also there may be more than one vtable pointer.

To analyze structures in dump programmatically, I would recommend using proper API. There are two APIs: DIA SDK and dbghelp functions. They are similar, but first one is object-based (COM) and second is just flat API, so the first may be easier to use.

As approach with heap_stat script is inherently limited, I would recommend for heap analysis use UMDH instead, which does not rely on vtable at all, and shows all kinds of objects

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
  • What makes you think heap_stat is not using DIA SDK or dbghelp functions? – Thomas Weller Feb 04 '19 at 14:26
  • @ThomasWeller, since it is WinDbg script, I think it uses WinDbg commands. Sure underneath WinDbg will use some of those. But for more fine analysis it is better to use APIs directly. – Alex Guteniev Feb 04 '19 at 14:33
  • heap_stat.py doesn't use any functions it parses text output of a windbg command ... by importing subprocess module the results can be obtained without having to install pykd https://imgur.com/a/QGYkydN – blabb Feb 04 '19 at 18:34
  • There must be at least one vptr for each polymorphic base class derived separately `class D : PolyBase1, PolyBase2, PolyBase3` has 3 vptr if each base is polymorphic. More when virtual inheritance is involved – curiousguy Feb 05 '19 at 12:44
0

In the meantime, I've found a terribly easy way to force vftable' entries for every class: just declare every destructor as virtual.

In order to find all destructors, who are not virtual yet, I've launched following command in my Ubuntu app within my development directory:

find ./ -name "*.h" -exec fgrep "~" {} /dev/null \; | grep -v "virtual"

After having declared all destructors as virtual, I'm planning to do some performance testing (I believe that declaring a method as virtual might have an impact on the speed, as the method declaration has been changed, especially for a server application with heavy load), I'll keep this post up to date.

Dominique
  • 16,450
  • 15
  • 56
  • 112