2

I would like to know the impact of both generic and explicit type implementations of a class/struct. [on performance and code/binary size]

For example, let's say I want to implement a tuple struct that can accept any of these value types (int, float, double).

there are two ways to go about it:

1- to use generic struct with templates

template <class T>
struct tuple{
    T x,y;
    //... the rest of methods and operand implementations
};

2- implementing a copy for each type explicitly

struct tuplef{
    float x,y;
    //... the rest of methods and operand implementations
};

struct tuplei{
    int x,y;
    //... the rest of methods and operand implementations
};

struct tupled{
    double x,y;
    //... the rest of methods and operand implementations
};

In my opinion, the first approach is easier to update and maintain but not safe when the user tries to use types that are not accounted for in some methods implementation (which will require filtering and routing to implementations for different types and that might add some extra operations), however in the second approach, it will be safer since only specific types are accepted but exhausting to deal with different versions of code to update an implementation of a method and it is so redundant and involves more lines of code.

Looking forward to be enlightened on different perspectives of this.

note: I googled it first and couldn't find much on the matter

Edit: one more point to consider here is that in the first approach including the implementation file where we want to use the class (cpp) is inevitable when using member methods that use the generic type, however in the second we can just include the header file (h). Seems that this has a related impact to the topic [check this out].

Community
  • 1
  • 1
CME64
  • 1,673
  • 13
  • 24
  • you could try compiling a small example of both, and check the size like this http://blog2.emptycrate.com/content/nobody-understands-c-part-5-template-code-bloat – wizurd Dec 05 '15 at 22:27

3 Answers3

3

Regarding performance, these two ways will not have any difference since generic types are expanded during compilation, i.e the compiler will produce the same structs as the second way but with other names.

Because the compiler generates the structs for you, the binary size is dependent on how many different types you have used in your code. If you use tuple<int>, tuple<double>, tuple<char>, tuple<float> then four different structs are generated which means that your binary will be larger compared to method two. However you see that you gain flexibility and also maintaining is easier (as you already said).

If you see one of the cases that you have is much different from others, then separate it or make a specialised template but always assume that you are covering more than just three types, that way you see that maintenance is much easier using templates.

One more thing is that since everything with templates are compile time, you will not get a runtime error. i.e if you pass a type to template, either it will be compiled and works or the compiler gives you an error. You don't get a case where you compile your code correctly and fail at runtime.

Ashkan
  • 1,050
  • 15
  • 32
  • You're right they are calculated at compile-time which would allow the compiler to optimize the generated types by limiting them to the used ones, and that the generated duplicates would be similar to the 2nd approach. however, I believe that the compiler may have the ability to optimize the usage of the duplicates instead of just duplicating them with different types and in result saving memory by minimizing unnecessary duplicate implementations when present. another point you missed is that the 1st approach is flexible but may not handle different types properly (requires handling code). – CME64 Dec 06 '15 at 06:47
3

Naturally the binary size is going to be a bit compiler/linker-dependent, but I've yet to find a case where using a class template and generating the appropriate template instantiations actually inflated binary size any more than the handwritten equivalent unless your handwritten tuples are exported across a dylib.

Linkers do a pretty fantastic job here at eliminating redundant code between multiple translation units. This is not something I'm merely taking for granted. At my previous workplace, we had to deal with a very obsessive mindset about binary distribution size, and had to effectively show that these kinds of class templates that had direct handwritten equivalents did not actually increase the distribution size any more than the handwritten equivalents.

There are cases where code generation of any sort can bloat binaries, but that's typically for cases where code generation is used as a static alternative to dynamic forms of branching (static vs. dynamic polymorphism, e.g.). For example, compare std::sort to C's qsort. If you sorted a boatload of trivially-constructible/destructible types stored contiguously with std::sort and then qsort, chances are that qsort would yield a smaller binary as it involves no code generation and the only unique code required per type would be a comparator. std::sort would generate a whole new sorting function for each type handled differently with the comparator potentially inlined.

That said, std::sort typically runs 2-3 times faster than qsort in exchange for the larger binary due to exchanging dynamic dispatch for static dispatch, and that's typically where you see code generation making a difference -- when the choice is between speed (with code generation) or smaller binary size (without).

There are some aesthetics that might lead you to favor the handwritten version anyway like so:

struct tuplef{
    float x,y;
    //... the rest of methods and operand implementations
};

... but performance and binary size should not be among them. This kind of approach can be useful if you want these various tuples to diverge more in their design or implementation. For example, you might have a tupled which wants to align its members and use SIMD with an AoS rep, like so*:

* Not a great example of SIMD which only benefits from 128-bit XMM registers, but hopefully enough to make a point.

struct tupled{
    ALIGN16 double xy[2];
    //... the rest of methods and operand implementations in SIMD
};

... this kind of variation can be quite awkward and unwieldy to implement if you just have one generic tuple.

template <class T>
struct tuple{
    T x,y;
    //... the rest of methods and operand implementations
};

It is worth noting with a class template like this that you don't necessarily need to make everything a member function of the class. You can gain a lot more flexibility and simplicity by preferring non-members like so:

typedef tuple<float> tuplef;
typedef tuple<double> tupled;

/// 'some_operation' is only available for floating-point tuples.
double some_operation(const tupled& xy) {...}
float some_operation(const tuplef& xy) {...}

... where you can now use plain old function overloading in cases where implementations of some_operation need to diverge from each other based on the type of tuple. You can also omit overloads of some_operation for types where it doesn't make sense and get that kind of filtering and routing behavior you were talking about. It also helps prevent your tuple from turning into a monolith to favor nonmembers and decouples it from operations which don't apply equally to all tuples.

You can also achieve this, of course, with some fancier techniques while still keeping everything a member of the class. Yet favoring nonmembers here for implementations which diverge between various types of tuples, or ones that only apply to certain types of tuples, can help keep the code a lot more plain. You can favor members for common denominator operations that apply to all tuples and are implemented pretty much the same way, while favoring nonmembers for operations which diverge between tuple types, e.g.

  • Good points there. So you're talking about a hybrid approach. I prefer using them as members since most of the operations are operators implementation (+-*/ == != ...), I could have different methods targeted at each type, but can they be compiled depending on the choice of the generic type ? (i.e. if float => only compiles method x for floats and ignore the overloaded methods (memory saving). also could you please comment on the edit I added? – CME64 Dec 06 '15 at 06:53
  • 1
    @CME64 It gets a little more knee-deep in template code but you can do this kind of "filtering/routing" using methods. About templates, yeah, typically their implementation needs to be visible at the time of code generation, and changes to their implementation do require the recompilation of all dependent translation units. If that's a big enough concern, it might be worth implementing all these tuples separately. You can still, say, use a generic tuple to implement a non-generic tuple. –  Dec 06 '15 at 09:30
  • hmm, I'm currently developing a game engine for fun and performance is a key. also I'm trying to separate the assemblies so that I don't need to recompile the whole code all together. this template implementation visibility could go against that in this case. – CME64 Dec 06 '15 at 09:59
  • 1
    @CME64 You might be past the point of this advice, but I work on path tracers where speed is pretty much the sole measure of quality besides correctness so I understand the concern. But performance still constitutes a very small portion of a codebase -- it's easier to opt into it for the critical paths than to try to achieve it uniformly. It's difficult to generalize a math lib, for example, and still make it the most optimal solution performance-wise, since a specific site of code might most benefit from very broad optimization strategies which often want to obliterate away... –  Dec 07 '15 at 00:38
  • 1
    @CME64 .... those math structures in favor of SIMD intrinsics which need to consider data in bulk (objects always want to kind of tackle operations on a per-object basis). It's typically easier if you don't let those concerns dominate (too much) the most general and reused portions of your codebase. Also templates won't be an obstacle into performance unless you want to do something like make your tuplef have a non-uniform rep from tuplei, e.g. -- the templates will only get in a way if each type of tuple can significantly diverge in memory layout and implementation. –  Dec 07 '15 at 00:40
  • 1
    Path tracers must have been a pain to optimize. I was thinking that if I build an optimum math library (hence clean code blocks) then they will reflect a good performance wherever they're used since they will be used a lot. I will try to work with templates for now and whenever It bites me in the face, I will revert back to explicit typing and post an update about it in this thread. thanks for your advice and effort. – CME64 Dec 07 '15 at 06:21
1

There is a 3rd alternative. Use the enable_if idiom to only enable a template for specific types. Using a type that is not enabled will result in a compiler error. Example.

#include <type_traits>

// base template
template<typename T, typename = void>
struct tuple {};

template<typename T>
inline constexpr bool enable()
{ return std::is_integral<T>::value || std::is_floating_point<T>::value; }

// enable template for ints/floats, etc.
template<typename T>
struct tuple<T, typename std::enable_if<enable<T>()>::type>
{
    T x, y;
    // other methods here.
};

This approach combines the advantages of generic templating (i.e. reducing duplication) while also maintaining control of used types. Note that there is a cost in that extendability is limited. If the user wants to introduce a new type and use tuple<T>, they will be unable to (without providing a duplicate implementation, of course).

EDIT I realize that this answer doesn't directly answer your question, but I still think it's relevant to the topic, so I've made it a community post and left it as-is.

Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
  • It's not directly related but is helpful for fool-proofing against unwanted types. thanks for sharing this. – CME64 Dec 06 '15 at 06:35