4

This is a continuation of What is the function parameter equivalent of constexpr? In the original question, we are trying to speed-up some code that performs shifts and rotates under Clang and VC++. Clang and VC++ does not optimize the code well because it treats the shift/rotate amount as variable (i.e., not constexpr).

When I attempt to parameterize the shift amount and the word size, it results in:

$ g++ -std=c++11 -march=native test.cxx -o test.exe
test.cxx:13:10: error: function template partial specialization is not allowed
uint32_t LeftRotate<uint32_t, unsigned int>(uint32_t v)
         ^         ~~~~~~~~~~~~~~~~~~~~~~~~
test.cxx:21:10: error: function template partial specialization is not allowed
uint64_t LeftRotate<uint64_t, unsigned int>(uint64_t v)
         ^         ~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.

Here's the test program. Its a tad bit larger than needed so folks can see we need to handle both uint32_t and uint64_t (not to mention uint8_t, uint16_t and other types).

$ cat test.cxx
#include <iostream>
#include <stdint.h>

template<typename T, unsigned int R>
inline T LeftRotate(unsigned int v)
{
  static const unsigned int THIS_SIZE = sizeof(T)*8;
  static const unsigned int MASK = THIS_SIZE-1;
  return T((v<<R)|(v>>(-R&MASK)));
};

template<uint32_t, unsigned int R>
uint32_t LeftRotate<uint32_t, unsigned int>(uint32_t v)
{
  __asm__ ("roll %1, %0" : "+mq" (v) : "I" ((unsigned char)R));
  return v;
}

#if __x86_64__
template<uint64_t, unsigned int R>
uint64_t LeftRotate<uint64_t, unsigned int>(uint64_t v)
{
  __asm__ ("rolq %1, %0" : "+mq" (v) : "J" ((unsigned char)R));
  return v;
}
#endif

int main(int argc, char* argv[])
{
  std::cout << "Rotated: " << LeftRotate<uint32_t, 2>((uint32_t)argc) << std::endl;
  return 0;
}

I've been through a number of iterations of error messages depending on how I attempt to implement the rotate. Othr error messages include no function template matches function template specialization.... Using template <> seems to produce the most incomprehensible one.

How do I parameterize the shift amount in hopes that Clang and VC++ will optimize the function call as expected?

Community
  • 1
  • 1
jww
  • 97,681
  • 90
  • 411
  • 885

2 Answers2

3

Another way is to turn the templated constant into a constant argument which the compiler can optimise away.

step 1: define the concept of a rotate_distance:

template<unsigned int R> using rotate_distance = std::integral_constant<unsigned int, R>;

step 2: define the rotate functions in terms of overloads of a function which takes an argument of this type:

template<unsigned int R>
uint32_t LeftRotate(uint32_t v, rotate_distance<R>)

Now, if we wish we can simply call LeftRotate(x, rotate_distance<y>()), which seems to express intent nicely,

or we can now redefine the 2-argument template form in terms of this form:

template<unsigned int Dist, class T>
T LeftRotate(T t)
{
  return LeftRotate(t, rotate_distance<Dist>());
}

Full Demo:

#include <iostream>
#include <stdint.h>
#include <utility>

template<unsigned int R> using rotate_distance = std::integral_constant<unsigned int, R>;

template<typename T, unsigned int R>
inline T LeftRotate(unsigned int v, rotate_distance<R>)
{
  static const unsigned int THIS_SIZE = sizeof(T)*8;
  static const unsigned int MASK = THIS_SIZE-1;
  return T((v<<R)|(v>>(-R&MASK)));
}

template<unsigned int R>
uint32_t LeftRotate(uint32_t v, rotate_distance<R>)
{
  __asm__ ("roll %1, %0" : "+mq" (v) : "I" ((unsigned char)R));
  return v;
}

#if __x86_64__
template<unsigned int R>
uint64_t LeftRotate(uint64_t v, rotate_distance<R>)
{
  __asm__ ("rolq %1, %0" : "+mq" (v) : "J" ((unsigned char)R));
  return v;
}
#endif


template<unsigned int Dist, class T>
T LeftRotate(T t)
{
  return LeftRotate(t, rotate_distance<Dist>());
}

int main(int argc, char* argv[])
{
  std::cout << "Rotated: " << LeftRotate((uint32_t)argc, rotate_distance<2>()) << std::endl;
  std::cout << "Rotated: " << LeftRotate((uint64_t)argc, rotate_distance<2>()) << std::endl;
  std::cout << "Rotated: " << LeftRotate<2>((uint64_t)argc) << std::endl;
  return 0;
}

pre-c++11 compilers

Prior to c++11 we didn't have std::integral_constant, so we have to make our own version.

For our purposes, this is sufficient:

template<unsigned int R> struct rotate_distance {};

full proof - note the effect of optimisations:

https://godbolt.org/g/p4tsQ5

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • Thanks Richard. This may work for us. The pain point is moving from the clean room into production code. This particular shift/rotate code has been stable for about 20 years, so I have to be careful about how much gets nudged (especially on the older compilers, like GCC 3 and VC++ 2002). – jww Sep 04 '16 at 09:15
  • @jww I think you'll find that the generated assembler is absolutely identical (after optimisation). It's just a different way of expressing intent. – Richard Hodges Sep 04 '16 at 09:27
  • @jww note the reversal of template argument order in the 2-tparam form. This allows deduction of the second template argument while fixing the first. – Richard Hodges Sep 04 '16 at 09:28
  • Thanks again Richard. I added some code generation tests for the ["immediate rotate" instructions](http://github.com/weidai11/cryptopp/commit/cc1fe049cdfb235e03f80d794399ecef9879bb92). I'm seeing some potential sore spots due to the integrated assembler. I'm also not seeing `rorx` generation for BMI-capable processors. Next on to VC++ generation to see how things fare. – jww Sep 04 '16 at 17:01
1

Use a template class, rather than a template function:

#include <iostream>
#include <stdint.h>


template<typename T, unsigned int R>
struct LeftRotate {
    static inline T compute(T v)
    {
        static const unsigned int THIS_SIZE = sizeof(T)*8;
        static const unsigned int MASK = THIS_SIZE-1;
        return T((v<<R)|(v>>(-R&MASK)));
    }
};


template<unsigned int R>
struct LeftRotate<uint32_t, R> {
    static inline uint32_t compute(uint32_t v)
    {
        __asm__ ("roll %1, %0" : "+mq" (v) : "I" ((unsigned char)R));
        return v;
    }
};

#if __x86_64__
template<unsigned int R>
struct LeftRotate<uint64_t, R> {
    static inline uint64_t compute(uint64_t v)
    {
        __asm__ ("rolq %1, %0" : "+mq" (v) : "J" ((unsigned char)R));
        return v;
    }
};
#endif

int main(int argc, char* argv[])
{
  std::cout << "Rotated: " << LeftRotate<uint32_t, 2>::compute((uint32_t)argc) << std::endl;
  return 0;
}
Leon
  • 31,443
  • 4
  • 72
  • 97