C++ Techniques: Type-Erasure vs. Pure Polymorphism

Question

What are the advantages/disadvantages of the two techniques in comparison ? And more importantly: Why and when should one be used over the other ? Is it just a matter of personal taste/preference ?

To the best of my abilities, I haven't found another post that explicitly addresses my question. Among many questions regarding the actual use of polymorphism and/or type-erasure, the following seems to be closest, or so it seemed, but it doesn't really address my question either:

C++ -& CRTP . Type erasure vs polymorphism

Please, note that I very well understand both techniques. To this end, I provide a simple, self-contained, working example below, which I'm happy to remove, if it is felt unnecessary. However, the example should clarify what the two techniques mean with respect to my question. I'm not interested in discussing nomenclatures. Also, I know the difference between compile- and run-time polymorphism, though I wouldn't consider this to be relevant to the question. Note that my interest is less in performance-differences, if there are any. However, if there was a striking argument for one or the other based on performance, I'd be curious to read it. In particular, I would like to hear about concrete examples (no code) that would really only work with one of the two approaches.

Looking at the example below, one primary difference is the memory-management, which for polymorphism remains on the user-side, and for type-erasure is neatly tucked away requiring some reference-counting (or boost). Having said that, depending on the usage scenarios, the situation might be improved for the polymorphism-example by using smart-pointers with the vector (?), though for arbitrary cases this may very well turn out to be impractical (?). Another aspect, potentially in favor of type-erasure, may be the independence of a common interface, but why exactly would that be an advantage (?).

The code as given below was tested (compiled & run) with MS VisualStudio 2008 by simply putting all of the following code-blocks into a single source-file. It should also compile with gcc on Linux, or so I hope/assume, because I see no reason why not (?) :-) I have split/divided the code here for clarity.

These header-files should be sufficient, right (?).

#include <iostream>
#include <vector>
#include <string>

Simple reference-counting to avoid boost (or other) dependencies. This class is only used in the type-erasure-example below.

class RefCount
{
  RefCount( const RefCount& );
  RefCount& operator= ( const RefCount& );
  int m_refCount;

  public:
    RefCount() : m_refCount(1) {}
    void Increment() { ++m_refCount; }
    int Decrement() { return --m_refCount; }
};

This is the simple type-erasure example/illustration. It was copied and modified in part from the following article. Mainly I have tried to make it as clear and straightforward as possible. http://www.cplusplus.com/articles/oz18T05o/

class Object {
  struct ObjectInterface {
    virtual ~ObjectInterface() {}
    virtual std::string GetSomeText() const = 0;
  };

  template< typename T > struct ObjectModel : ObjectInterface {
    ObjectModel( const T& t ) : m_object( t ) {}
    virtual ~ObjectModel() {}
    virtual std::string GetSomeText() const { return m_object.GetSomeText(); }
    T m_object;
 };

  void DecrementRefCount() {
    if( mp_refCount->Decrement()==0 ) {
      delete mp_refCount; delete mp_objectInterface;
      mp_refCount = NULL; mp_objectInterface = NULL;
    }
  }

  Object& operator= ( const Object& );
  ObjectInterface *mp_objectInterface;
  RefCount *mp_refCount;

  public:
    template< typename T > Object( const T& obj )
      : mp_objectInterface( new ObjectModel<T>( obj ) ), mp_refCount( new RefCount ) {}
    ~Object() { DecrementRefCount(); }

    std::string GetSomeText() const { return mp_objectInterface->GetSomeText(); }

    Object( const Object &obj ) {
      obj.mp_refCount->Increment(); mp_refCount = obj.mp_refCount;
      mp_objectInterface = obj.mp_objectInterface;
    }
};

struct MyObject1 { std::string GetSomeText() const { return "MyObject1"; } };
struct MyObject2 { std::string GetSomeText() const { return "MyObject2"; } };

void UseTypeErasure() {
  typedef std::vector<Object> ObjVect;
  typedef ObjVect::const_iterator ObjVectIter;

  ObjVect objVect;
  objVect.push_back( Object( MyObject1() ) );
  objVect.push_back( Object( MyObject2() ) );

  for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
    std::cout << iter->GetSomeText();
}

As far as I'm concerned, this seems to achieve pretty much the same using polymorphism, or maybe not (?).

struct ObjectInterface {
  virtual ~ObjectInterface() {}
  virtual std::string GetSomeText() const = 0;
};

struct MyObject3 : public ObjectInterface {
  std::string GetSomeText() const { return "MyObject3"; } };

struct MyObject4 : public ObjectInterface {
  std::string GetSomeText() const { return "MyObject4"; } };

void UsePolymorphism() {
  typedef std::vector<ObjectInterface*> ObjVect;
  typedef ObjVect::const_iterator ObjVectIter;

  ObjVect objVect;
  objVect.push_back( new MyObject3 );
  objVect.push_back( new MyObject4 );

  for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
    std::cout << (*iter)->GetSomeText();

  for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
    delete *iter;
}

And finally for testing all of the above together.

int main() {
  UseTypeErasure();
  UsePolymorphism();
  return(0);
}

Did you ever had a look at Adobe's `poly` class or Boost.TypeErasure (accepted but not yet released) ? Besides CRTP those implement "concept-based" polymorphism that has proper value semantics. — J.N., Nov 09 '12 at 14:38
In my book, both "polymorphism" and "type erasure" aren't *paradigms* as much as they are simply tools in the language's tool box. Moreover, I can't think of a way of having type erasure *without* also using polymorphism, so I'm even more confused about what you're after. — Kerrek SB, Nov 09 '12 at 14:41
J.N., thanks a lot, I'll check that out in due course. In principle, I'm all for re-using, in particular, when working on non-commercial or private code, however, for commercial code I always balance the decision against having less external dependencies, but that's another discussion... :-) In any case, I'm eager to learn more... — Dr.D., Nov 09 '12 at 14:50
If you want my take on this: Use inheritance polymorphism only when you need a related collection of types *which can only be decided at runtime*, such as message packets coming from network I/O, or epoll events, or runtime-selected class registries/factories, etc. Use type erasure if you absolutely need a fixed, single type to handle a heterogeneous set of things, as is the case with `std::shared_ptr`, `std::function` or `boost::any`. Do everything else at compile time. — Kerrek SB, Nov 09 '12 at 14:54
Kerrek, as I wrote and with all due respect, I'm not interested in discussing nomenclatures, call it tools, approaches, some way of doing something... I'm well aware that some use of polymorphism is "hidden" in the type-erasure-example, and I've considered to put it into my question, but that's not the point. The two examples present two ways of achieving very similar results (IMHO) - is one superior to the other (?), does it depend on the usage scenario (?), or is it all just personal preference and "semantic sugar" (?). Thanks for the second comment. — Dr.D., Nov 09 '12 at 15:00
@Dr.D.: That's why I wrote the second comment. Your question sounds to me like "Is X or Y the better solution?", which I don't find terribly useful. The only question that matters is "How do I solve Problem Z?". My second comment contains the only "absolute" advice I can think of, and all other choices come down to picking the right tool to solve your actual problem. (Although if pressed I'd argue that any code that starts with `class Object` is suspect :-).) — Kerrek SB, Nov 09 '12 at 15:12
`UseTypeErasure()` calls `virtual std::string GetSomeText()` which uses a vtable. Is there type erasure in somewhere that I don't see? What exactly does this technique buy you? — n. m. could be an AI, Nov 09 '12 at 16:27

Yakk - Adam Nevraumont · Accepted Answer · 2012-11-09T18:15:30.223

C++ style virtual method based polymorphism:

You have to use classes to hold your data.
Every class has to be built with your particular kind of polymorphism in mind.
Every class has a common binary-level dependency, which restricts how the compiler creates the instance of each class.
The data you are abstracting must explicitly describe an interface that describes your needs.

C++ style template based type erasure (with virtual method based polymorphism doing the erasure):

You have to use template to talk about your data.
Each chunk of data you are working on may be completely unrelated to other options.
The type erasure work is done within public header files, which bloats compile time.
Each type erased has its own template instantiated, which can bloat binary size.
The data you are abstracting need not be written as being directly dependent on your needs.

Now, which is better? Well, that depends if the above things are good or bad in your particular situation.

As an explicit example, std::function<...> uses type erasure which allows it to take function pointers, function references, output of a whole pile of template-based functions that generate types at compile time, myraids of functors which have an operator(), and lambdas. All of these types are unrelated to one another. And because they aren't tied to having a virtual operator(), when they are used outside of the std::function context the abstraction they represent can be compiled away. You couldn't do this without type erasure, and you probably wouldn't want to.

On the other hand, just because a class has a method called DoFoo, doesn't mean that they all do the same thing. With polymorphism, it isn't just any DoFoo you are calling, but the DoFoo from a particular interface.

As for your sample code... your GetSomeText should be virtual ... override in the polymorphism case.

There is no need to reference count just because you are using type erasure. There is no need not to use reference counting just because you are using polymorphsm.

Your Object could wrap T*s like how you stored vectors of raw pointers in the other case, with manual destruction of their contents (equivalent to having to call delete). Your Object could wrap a std::shared_ptr<T>, and in the other case you could have vector of std::shared_ptr<T>. Your Object could contain a std::unique_ptr<T>, equivalent to having a vector of std::unique_ptr<T> in the other case. Your Object's ObjectModel could extract copy constructors and assignment operators from the T and expose them to Object, allowing full-on value semantics for your Object, which corresponds to the a vector of T in your polymorphism case.

Managu · Answer 2 · 2012-11-09T15:48:11.613

Here's one view: The question seems to ask how one should choose between late binding ("runtime polymorphism") and early binding ("compile-time polymorphism").

As KerrekSB points out in his comments, there are some things you can do with late binding that it just isn't realistic to do with early binding. Many uses of the Strategy pattern (decoding network I/O) or the Abstract Factory pattern (runtime-selected class factories) fall into this category.

If both approaches are viable, then choosing is a matter of the trade offs involved. In C++ applications, the main tradeoffs I see between early and late binding are implementation maintainability, binary size, and performance.

There are at least some people who feel that C++ templates in any shape or form are impossible to comprehend. Or possibly have some other, less dramatic reservation with templates. C++ templates have many little gotchas ("when do I need to use the 'typename' and 'template' keywords?"), and non-obvious tricks (SFINAE comes to mind).

Another tradeoff is optimization. When you bind early, you give the compiler more information about your program, and so it can (potentially) do a better job optimizing. When you bind late, the compiler (probably) doesn't know ahead of time as much information -- some of that information may be in other compilation units, and so the optimizer can't do as much.

Another tradeoff is program size. In C++ at least, using "compile-time polymorphism" sometimes balloons binary size, as the compiler creates, optimizes, and emits different code for each used specialization. In contrast, when binding late, there's only one code path.

It's interesting to compare the same tradeoff being made in a different context. Take web applications, where one uses (some type of) polymorphism to deal with differences between browsers, and possibly for internationalization (i18n)/localization. Now, a hand-written JavaScript web application would likely use what amounts to late binding here, by having methods which detect capabilities at runtime to figure out what to do. Libraries like jQuery take this tack.

Another approach is to write different code for each possible browser/i18n possibility. While this sounds absurd, it is far from unheard of. The Google Web Toolkit uses this approach. GWT has its "deferred binding" mechanism, used to specialize the compiler's output to different browsers and different localizations. GWT's "deferred binding" mechanism uses early binding: The GWT Java-to-JavaScript compiler figures out all possible ways the polymorphism might be needed, and spits out an entirely different "binary" for each.

The tradeoffs are similar. Wrapping your head around how you extend GWT using deferred binding can be a headache and a half; Having knowledge at compile time allows GWT's compiler to optimize each specialization separately, possibly yielding better performance, and smaller size for each specialization; The whole of a GWT application can end up being many times the size of a comparable jQuery application, due to all of the precompiled specializations.

score 0 · Answer 3 · answered May 04 '17 at 11:24

One benefit to runtime generics that no-one here has mentioned (?) is the possibility for code that is generated and injected into a running application, to use the same List, Hashmap / Dictionary etc. that everything else in that application is already using. Why you'd want to do that, is another question.

C++ Techniques: Type-Erasure vs. Pure Polymorphism

3 Answers3

Linked