How can a C++ template be specialized for all 32-bit POD types?

Question

I've developed a simple template function for swapping the byte order of a single field:

template <typename T> inline void SwapEndian(T& ptr) {
    char *bytes = reinterpret_cast<char*>(&ptr);
    int a = sizeof(T) / 2;
    while (a--) {
        char tmp = bytes[a];
        int b = sizeof(T) - 1 - a;
        bytes[a] = bytes[b];
        bytes[b] = tmp;
    }
}

I'll often use it where T = int or float. Both of these types are represented by 4 bytes on the target platforms, and can be processed by the same specialization of the template.

Because this function sometimes is responsible for processing large buffers of raw data, I've created an optimized specialization:

template<> inline void SwapEndian(float& ptr) {
    #if defined(__GNUC__)
        *reinterpret_cast<unsigned*>(&ptr) = __builtin_bswap32(*reinterpret_cast<unsigned*>(&ptr));

    #elif defined(_MSC_VER)
        *reinterpret_cast<unsigned*>(&ptr) = __byteswap_ulong(*reinterpret_cast<unsigned*>(&ptr));

    #endif
}

This specialization also works with 32-bit integers, signed or unsigned, so I have a big smelly pile of duplicates with only the type name different.

How do I route all instantiations of 4 byte POD types through this one template? (PS. I'm open to solving this in a different way, but in that case I'd like to know definitively whether or not it's possible to build these kind of meta-specialized templates.)

EDIT: Thanks everyone, after reading the answers and realizing that arithmetic is a better restriction than pod, I was inspired to write something. All the answers were useful but I could only accept one, so I accepted the one that appears to be structurally the same.

template<bool, bool> struct SwapEndian_ { template<typename T> static inline void _(T&); };
template<> template<typename T> inline void SwapEndian_<true, true>::_(T& ptr) {
    // ... stuff here ...
}
// ... more stuff here ...
template<typename T> inline void SwapEndian(T& ptr) {
    static_assert(is_arithmetic<T>::value, "Endian swap not supported for non-arithmetic types.");
    SwapEndian_<sizeof(T) & (8 | 4), sizeof(T) & (8 | 2)>::template _<T>(ptr);
}

Oh man sweet! that does seem just about right... provided that POD types expose their sizeof() in a compatible context. -- As for now, I guess that's the nail in the coffin and I'm stuck with the duplicates. — MickLH, Feb 20 '15 at 17:01
@OmnipotentEntity concepts, while awesome, are just syntactic sugar. — Yakk - Adam Nevraumont, Feb 20 '15 at 17:01

Yakk - Adam Nevraumont · Answer 1 · 2015-02-20T19:00:07.613

7

When in doubt, tag dispatch.

This implementation has 2 traits -- is_pod and get_sizeof_t. The base override dispatches to SwapEndians with those two traits tagged. There is also a is_pod override, and an override (which I'd advise =deleteing) for non-pod types.

Extension to new traits and types is relatively easy.

template<size_t n>
using sizeof_t = std::integral_constant<size_t, n>;
template<class T>
using get_sizeof_t = sizeof_t<sizeof(T)>;

template <class T>
void SwapEndian(T& t, std::true_type /*is pod*/, sizeof_t<4>) {
  std::cout << "4 bytes!\n";
  // code to handle 32 bit pods
}
template <class T, size_t n>
void SwapEndian(T& t, std::true_type /*is pod*/, sizeof_t<n>) {
  std::cout << "pod\n";
  // code to handle generic case
}
template <class T, size_t n>
void SwapEndian(T& t, std::false_type /*is pod*/, sizeof_t<n>) {
  std::cout << "not pod\n";
  // probably want to =delete this overload actually 
}
template<class T>
void SwapEndian(T& t) {
    SwapEndian(t, std::is_pod<T>{}, get_sizeof_t<T>{});
}

I am not sure if this is a good idea, but the above should do it.

Uses some C++14 features. Assumes CHAR_BIT is 8.

You should only rarely specialize template functions. Instead overload. Tag dispatching gives you the power of overload resolution to dispatch what code to run at compile time.

live example

edited Feb 20 '15 at 19:00

answered Feb 20 '15 at 17:03

Yakk - Adam Nevraumont

262,606
27
330
524

4

Well, technically speaking we could have a machine with 16-bit chars. So `CHAR_BIT*sizeof(T) == 32` would be a better condition, wouldn't it? (Edit: I must have overlooked that you mentioned the assumption, but still) – Columbo Feb 20 '15 at 17:12
@Columbo agreed. But if this is the first chunk of code that breaks in your project when `CHAR_BIT` does not equal `8`, well, I don't believe you. :) – Yakk - Adam Nevraumont Feb 20 '15 at 18:49
@Columbo Also, rewritten, because tag dispatching is just cleaner. – Yakk - Adam Nevraumont Feb 20 '15 at 19:00
"You should only rarely specialize template functions." - Why is overloading better than specialization? – Michael Gazonda Feb 20 '15 at 19:02
1

@MichaelGazonda specialization must be full, and does not change which overload is selected. It just changes the implementation of the selected overload. Specialization of template types changes both which specialization is selected, and its implementation. Between tag dispatching and forward-to-template type and SFINAE, you have the tools you need without using specializations. Specializations of template functions "look like" overloads so much that people often confuse them, and that leads to unexpected dispatching. There are rare use cases, but even then, fragile. – Yakk - Adam Nevraumont Feb 20 '15 at 19:05
@Yakk I generally prefer specialization where possible. Tag dispatching seems like more of a hack to me. It uses the generation of runtime code to do work at compile time... and then you hope/expect this to be optimized out later. I greatly prefer that specialization stays in the compile-time context. – Michael Gazonda Feb 20 '15 at 19:18
@MichaelGazonda: Tag dispatching is used a lot in the implementation details of C++ libraries. Any decent optimizer will remove the extra function calls. I appreciate the desire for a purer-feeling implementation, but in practice, compilers are good enough to make it work. Tag dispatching can be very useful when you have a number of implementations that you need to choose from at compile time. I also feel like this answer has, *by far*, the most readable implementation, which is another valuable trait. – Jason R Feb 20 '15 at 19:29
1

@MichaelGazonda what is this runtime you are talking about? By the C++ language standard, no work need be done by the machine at run time during a properly written tag dispatch "call". The return values can be fully elided, and the forwarding is more than perfect enough (references and trivial stateless types galore). The compiler is free to anti-optimize, but against some enemies we cannot win: we must assume an acceptable QoI from the compiler. – Yakk - Adam Nevraumont Feb 20 '15 at 19:31
@JasonR and Yakk - I appreciate the responses. I suppose I'm a bit jaded by how often I read something like "let the compiler do it's job!" only to find out that I need to optimize something myself. I would rather be as explicit as possible where I can be and then resort to compiler tricks when all else fails. I would feel differently about this if the compiler was *obligated* to optimize here. – Michael Gazonda Feb 20 '15 at 19:41
@MichaelGazonda the compiler is not obligated to not insert `for(int i = 0; i < std::numeric_limits::max()-1; ++i);` between every statement in your program. You don't get your wish. – Yakk - Adam Nevraumont Feb 20 '15 at 20:00
A compiler de-optimizing code is a separate issue from a compiler failing to fully optimize code. – Michael Gazonda Feb 20 '15 at 21:01
@MichaelGazonda My position is different. I write C++, and C++ describes what it means. A bad quality of implementation could inject random crap, or fail to elide return values, or inject pointless function calls: *all* are equally unacceptable. If you treat C++ as some kind of a one-to-one mapping with assembler, you will both be wrong quite often, and will write far worse code. That mapping is useful when learning C++, but saying "anything except that mapping cannot be relied upon" is too large a sacrifice. – Yakk - Adam Nevraumont Feb 20 '15 at 21:09
I don't we understand each other very well, so this is a good place to end this discussion. – Michael Gazonda Feb 20 '15 at 22:12

Michael Gazonda · Accepted Answer · 2015-02-20T18:41:22.023

I'm using a separate SwapEndian and SwapEndianImpl so that we can use template deduction and partial specialization.

template<bool> struct SwapEndianImpl
{
    template<typename t> static inline void Func(t& n);
};
template<> template<typename t> void SwapEndianImpl<false>::Func(t& n)
{
    std::cout << "not 32bit pod" << std::endl;
}
template<> template<typename t> void SwapEndianImpl<true>::Func(t& n)
{
    std::cout << "32bit pod" << std::endl;
}

template<typename t> inline void SwapEndian(t& n)
{
    SwapEndianImpl<std::is_pod<t>::value && sizeof(t) == (32 / CHAR_BIT)>::template Func<t>(n);
}

I believe that this is a better way to go than SFINAE if you specialize to more than two conditions.

score 2 · Answer 3 · 2015-02-20T17:47:20.603

You might limit your swap on arithmetic types (not using all POD types) and use specialized template classes for flexibility:

#include <climits>
#include <iostream>
#include <type_traits>

namespace Detail {
    template <
        typename T,
        unsigned N = sizeof(T) * CHAR_BIT,
        bool Swap = std::is_arithmetic<T>::value>
    struct SwapEndian
    {
        static void apply(T&) {
            std::cout << "Not Swapping\n";
        }
    };

    template <typename T>
    struct SwapEndian<T, 16, true>
    {
        static void apply(T&) {
            std::cout << "Swapping\n";
        }
    };

    template <typename T>
    struct SwapEndian<T, 32, true>
    {
        static void apply(T&) {
            std::cout << "Swapping\n";
        }
    };

    template <typename T>
    struct SwapEndian<T, 64, true>
    {
        static void apply(T&) {
            std::cout << "Swapping\n";
        }
    };
}

template <typename T>
void SwapEndian(T& value) {
    Detail::SwapEndian<T>::apply(value);
}

struct Structure
{
    char s[4];
};
static_assert(std::is_pod<Structure>::value, "Should be POD");


int main() {
    char c;
    short s;
    int i;
    long long l;
    float f;
    double d;
    void* p;
    Structure structure;
    SwapEndian(c);
    SwapEndian(s);
    SwapEndian(i);
    SwapEndian(l);
    SwapEndian(f);
    SwapEndian(d);
    SwapEndian(p);
    SwapEndian(structure);
}

How can a C++ template be specialized for all 32-bit POD types?

3 Answers3