Custom "non-traditional" polymorphism implementation

Question

I've been looking for a custom polymorphic solution to improve binary compatibility. The problem is that pointer members are varying size on different platforms, so even "static" width members get pushed around producing binary incompatible layouts.

They way I understand, most compilers implement v-tables in a similar way:

    __________________
   | |         |      |
   |0| vtable* | data | -> -> 
   |_|_________|______|

E.g. the v-table is put as the first element of the object, and in cases of multiple inheritance the inherited classes are sequentially aligned with the appropriate padding to align.

So, my idea is the following, put all v-tables (and all varying "platform width" members) "behind":

       __________________
      |         | |      |
   <- | vtable* |0| data | ->
      |_________|_|______|

This way the layout right of the 0 (alignment boundary for the first data member) is only comprised of types with explicit and portable size and alignment (so it can stay uniform), while at the same time you can still portably navigate through the v-tables and other pointer members using indices with the platform width stride. Also, since all members to the left will be the same size and alignment, this could reduce the need of extra padding in the layout.

Naturally, this means that the "this" pointer will no longer point at the beginning of the object, but will be some offset which will vary with every class. Which means that "new" and "delete" must make adjustments in order for the whole scheme to work. Would that have a measurable negative impact, considering that one way or another, offset calculation takes place when accessing members anyway?

My question is whether someone with more experience can point out potential caveats of using this approach.

Edit:

I did a quick test to determine whether the extra offset calculation will be detrimental to the performance of virtual calls (yeah yeah, I know it is C++ code inside a C question, but I don't have a nanosecond resolution timer for C, plus the whole point so to compare to the existing polymorphism implementation):

class A;
typedef void (*foo)(A*);

void bar(A*) {}

class A {
public:
    A() : a(&bar) { }
    foo a;
    virtual void b() {}
};

int main() {
    QVector<A*> c;
    int r = 60000000;

    QElapsedTimer t;
    for (int i = 0; i < r; ++i) c.append(new A);
    cout << "allocated " << c.size() << " object in " << (quint64)t.elapsed() << endl;

    for (int count = 0; count < 5; ++count) {
        t.restart();
        for (int i = 0; i < r; ++i) {
            A * ap = c[i]; // note that c[i]->a(c[i]) would 
            ap->a(ap);     // actually result in a performance hit
        }
        cout << t.elapsed() << endl;

        t.restart();
        for (int i = 0; i < r; ++i) {
            c[i]->b();
        }
        cout << t.elapsed() << endl;
    }
}

After testing with 60 million objects (70 million failed to allocate on the 32bit compiler I am currently using) it doesn't look like there is any measurable difference between calling a regular virtual function and calling through a pointer that is not the first element in the object (and therefore needs additional offset calculation), and even though in the case of the function pointer the memory address is passed twice, e.g. pass to find the offset of a and then pass into a). In release mode the time for the two functions are identical (+/- 1 nsec for 60mil calls), and in debug mode the function pointer is actually about 1% faster consistently (maybe a function pointer requires less resources than a virtual function).

The overhead from adjusting the pointer when allocating and deleting also seems to be practically negligible and totally within the margin of error. Which is kind of expected, considering it should add no more than a single increment of a value that is already on a register with an immediate value, something that should take a single cycle on the platforms I intend to target.

Why would it be a problem that function pointers might be different sizes on different platforms? You're not passing the pointers around between platforms are you? — Some programmer dude, Jan 29 '14 at 12:43
@JoachimPileborg - no, it would make no sense to, The purpose is to have the "user data" portion of the object directly compatible with other platforms. E.g. I might be passing raw binary data across platforms. — , Jan 29 '14 at 12:44
You are describing a solution that you want help with, but don't go into details about the problem it should solve (the so called [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)). You also talk about polymorphism and `this` pointers, while tagging the question as `C` when that language has nothing of the kind. Are you perhaps making your own C++ or Java (or similar) compiler? What is the "data" that you need to be binary compatible? Why do you need it do be binary compatible (a possibly huge undertaking in itself)? — Some programmer dude, Jan 29 '14 at 12:50
@JoachimPileborg - I am merely using familiar idioms, I by no means imply there is `this`, `new` or `delete` in the C language, and the reason it is tagged `C` is I am actually implementing it in C. The "data" is user data (everything but pointers and pointerdiff) - members which actually hold user data, which will by default be explicitly sized and aligned (no ambiguous `int`). And it is not that much help that I need with the implementation, but an early heads-up for weaknesses this approach may have from someone with more experience. — , Jan 29 '14 at 12:54
Is there reason to not use standard serialization techniques when passing data between platforms? Pushing the raw struct contents is asking for trouble. — user694733, Jan 29 '14 at 12:57
I had to deal with a codebase using similar techniques, and let me just say that it was an unmaintainable, undebbugable mess with opaque pointers all over the place. So, I recommend not doing it if you cherish your sanity. — SirDarius, Jan 29 '14 at 13:00
@SirDarius - how so? This does appear to be more consistent and predictable than what compilers who do not conform to an ABI standard do. Maybe you describe the issue of working with those facilities manually rather than leaving it up to the compiler? — , Jan 29 '14 at 13:02
@user694733 - it is safe when it is safe, which is what I want to make it. It is "asking for trouble" in a regular scenario between compilers and platforms which do not specify a concrete ABI, which is sort of my idea. And no, the primary purpose is not that much passing raw data between platforms but keeping the ABI static and predictable across different platforms, and therefore - easier to maintain. — , Jan 29 '14 at 13:04
The main problem was that with pointers moving in structures, the debugger (back then in Solaris, probably not such a good debugger) was unable to properly identify what they were pointing to, especially since the inheritance tree had more than 5 levels. The indirections were too complicated to even know which 'class' the current instance was. — SirDarius, Jan 29 '14 at 13:09
What's the point of having the same layout, say of vtable, if pointer to member are of different sizes anyway? — , Jan 29 '14 at 13:10
@VladLazarenko - the point is when you mix regular members with varying width members, you don't know what goes on with the entire layout, even if pointer sizes vary, the layout will be completely portable as long as regular members are separated from pointers, which may vary in width, but are all the same for any given platform (well, in at least 99% of the platforms). This way you have 2 structures you can portable access, the data structure using offsets in bytes, and the pointers using offsets in `void*` stride. — , Jan 29 '14 at 13:20
@user2341104: Sounds like you are data structures design is wrong :-\ I would not mix serializable data with code, especially polymorphic classes. — , Jan 29 '14 at 13:23
@VladLazarenko what do you mean by "mix serializable data with code"? Who does serialize memory addresses which vary from run to run and even from allocation to allocation? What code? The object instance has no "code" at all, only members... code is just standalone functions (taking a pointer to work as a member function) — , Jan 29 '14 at 13:25
@VladLazarenko - I am doing the full opposite, moving all serializable data to one layout positioned right of the object base address, and all pointers in another layout, positioned right of the object base address. Instead of putting it all in a single layout which will be broken by the different sized of pointers on different platforms. — , Jan 29 '14 at 13:28
Even if you take pointers out of it, you will still have to manage with padding and endianness differences. — user694733, Jan 29 '14 at 14:05
@user694733 - padding - true, but not as much as if putting the pointers together with the other members. As for endianness - it does not matters that much (as long as I get the right offsets), considering I do not plan on targeting big endian platforms. At any rate, converting endianness is a significantly lesser hassle. — , Jan 29 '14 at 14:12
@user694733 - by filling the space up with padding I guess? Padding does not "put a stick in my wheel", as if it is needed, then it is needed on all platforms - the layout will still be portable. — , Jan 29 '14 at 14:24
But platforms may have different padding requirements. Some padding may be inefficient on other platform, or will not work at all. Unless you limit compability to platforms with the same padding rules, of course. — user694733, Jan 29 '14 at 14:29
@user694733 for the platforms I am interested in I am safe, the alignment rules are pretty generic for 99% of the platforms, typically objects are aligned on a boundary the size of the objects, if you care about optimal access anyway. Even the aligning of SIMD types is uniform across different platforms. Thus my intent to implement a uniform ABI to make use of this, at the cost of not supporting platforms with let's say 7-bit bytes. — , Jan 29 '14 at 14:36

score 0 · Answer 1 · answered Jan 29 '14 at 13:05

0

FYI, the address of the vtable is placed at the first DWORD/QWORD of the object, not the table itself. the vtable is shared between objects of the same class/struct.

Having different vtable sizes between platforms is a non-issue BTW. Incompatible platforms can't execute native code of other platforms and for binary translation to work, the emulator needs to know the original architecture.

The main drawbacks to your solution are performance and complexity over the current implementations.

answered Jan 29 '14 at 13:05

egur

7,830
2
27
47

I never stated that the problem is the size or layout of the v-table, the problem is the size and layout of objects and how they vary between platforms with different pointer widths. What kind of performance drawbacks you envision? – Jan 29 '14 at 13:06
A penalty of a few cycles to each virtual function call because the compiler will add code to offset the object's address in order to find the vtable address. – egur Jan 29 '14 at 13:09
The adjustment is made only when the object is allocated, when you call a virtual function, it will directly use the adjusted address. And the actual adjustment I suspect will be no more slower than calculating the offset for accessing any regular member, e.g. a single clock to add the "base" address to the immediate value for the particular type. – Jan 29 '14 at 13:13

Custom "non-traditional" polymorphism implementation

1 Answers1