I've been looking for a custom polymorphic solution to improve binary compatibility. The problem is that pointer members are varying size on different platforms, so even "static" width members get pushed around producing binary incompatible layouts.
They way I understand, most compilers implement v-tables in a similar way:
__________________
| | | |
|0| vtable* | data | -> ->
|_|_________|______|
E.g. the v-table is put as the first element of the object, and in cases of multiple inheritance the inherited classes are sequentially aligned with the appropriate padding to align.
So, my idea is the following, put all v-tables (and all varying "platform width" members) "behind":
__________________
| | | |
<- | vtable* |0| data | ->
|_________|_|______|
This way the layout right of the 0 (alignment boundary for the first data member) is only comprised of types with explicit and portable size and alignment (so it can stay uniform), while at the same time you can still portably navigate through the v-tables and other pointer members using indices with the platform width stride. Also, since all members to the left will be the same size and alignment, this could reduce the need of extra padding in the layout.
Naturally, this means that the "this
" pointer will no longer point at the beginning of the object, but will be some offset which will vary with every class. Which means that "new
" and "delete
" must make adjustments in order for the whole scheme to work. Would that have a measurable negative impact, considering that one way or another, offset calculation takes place when accessing members anyway?
My question is whether someone with more experience can point out potential caveats of using this approach.
Edit:
I did a quick test to determine whether the extra offset calculation will be detrimental to the performance of virtual calls (yeah yeah, I know it is C++ code inside a C question, but I don't have a nanosecond resolution timer for C, plus the whole point so to compare to the existing polymorphism implementation):
class A;
typedef void (*foo)(A*);
void bar(A*) {}
class A {
public:
A() : a(&bar) { }
foo a;
virtual void b() {}
};
int main() {
QVector<A*> c;
int r = 60000000;
QElapsedTimer t;
for (int i = 0; i < r; ++i) c.append(new A);
cout << "allocated " << c.size() << " object in " << (quint64)t.elapsed() << endl;
for (int count = 0; count < 5; ++count) {
t.restart();
for (int i = 0; i < r; ++i) {
A * ap = c[i]; // note that c[i]->a(c[i]) would
ap->a(ap); // actually result in a performance hit
}
cout << t.elapsed() << endl;
t.restart();
for (int i = 0; i < r; ++i) {
c[i]->b();
}
cout << t.elapsed() << endl;
}
}
After testing with 60 million objects (70 million failed to allocate on the 32bit compiler I am currently using) it doesn't look like there is any measurable difference between calling a regular virtual function and calling through a pointer that is not the first element in the object (and therefore needs additional offset calculation), and even though in the case of the function pointer the memory address is passed twice, e.g. pass to find the offset of a
and then pass into a
). In release mode the time for the two functions are identical (+/- 1 nsec for 60mil calls), and in debug mode the function pointer is actually about 1% faster consistently (maybe a function pointer requires less resources than a virtual function).
The overhead from adjusting the pointer when allocating and deleting also seems to be practically negligible and totally within the margin of error. Which is kind of expected, considering it should add no more than a single increment of a value that is already on a register with an immediate value, something that should take a single cycle on the platforms I intend to target.