3

I have lots of strings each with size 8 or less.

I need to do lots of comparisons there using memcmp() / strcmp().

I wonder if comparisons will work faster if I convert all them to std::uint64_t. In this case, at least on theory comparison will be branch-less also will happen in single CPU operation.

Did anyone tried something similar?

Here is some test code that generate those numbers. I am assuming little endian machine.

I know code can be significantly simplified if I use htobe32() / htobe64().

#include <cstdint>

#include <algorithm>    // std::reverse_copy

namespace rev_impl{
    template<typename T>
    T rev(const char *s){
        T t;
        char *pt = reinterpret_cast<char *>(&t);

        std::reverse_copy(s, s + sizeof(T), pt);

        return t;
    }
}

inline uint32_t rev32(const char *s){
    return rev_impl::rev<uint32_t>(s);
}

inline uint64_t rev64(const char *s){
    return rev_impl::rev<uint64_t>(s);
}


#include <iostream>
#include <iomanip>

template<typename T>
void print_rev(const char *s){
    constexpr auto w = sizeof(T) * 2;

    std::cout << std::setw(w) << std::setfill('.') << std::hex << rev_impl::rev<T>(s) << '\n';
}

inline void print_rev32(const char *s){
    return print_rev<uint32_t>(s);
}

inline void print_rev64(const char *s){
    return print_rev<uint64_t>(s);
}

int main(){
    print_rev64("\0\0\0\0\0\0\0a");
    print_rev64("a\0\0\0\0\0\0\0");

    print_rev32("Niki");
    print_rev32("Nika");
    print_rev32("Nikz");
}

here is test output:

..............61
6100000000000000
4e696b69
4e696b61
4e696b7a
max66
  • 65,235
  • 10
  • 71
  • 111
Nick
  • 9,962
  • 4
  • 42
  • 80
  • How would you convert them if the size is less? Do you have padding characters past the end of string allocated? You could in theory use this approach being aware of platform endianness, but reverse_copy in your implementation kills all the gain in performance. – bipll Feb 18 '18 at 14:16
  • You may also have problems using a uint64_t comparison it the data in not aligned on a 64bit (8 byte) boundary (address). – Marker Feb 18 '18 at 14:42
  • @bipll pad with zeroes. – Nick Feb 18 '18 at 15:07
  • @Marker lets suppose we align it correctly. – Nick Feb 18 '18 at 15:08
  • Comparing 8 bytes as one 64-bit number should certainly be faster than comparing 8 individual bytes; I have done similar things for comparing large blocks of memory. Is your use case faster? I guess it all depends on how much overhead you will have zero padding and making sure that everything is aligned on a 64-bit boundary. – Marker Feb 18 '18 at 16:24

1 Answers1

0

If you have to convert only string literals, you can write rev to accept array of chars as follows

template <typename T, std::size_t N,
          typename = typename std::enable_if<(N<=sizeof(T)+1U)>::type>
constexpr T rev (char const (&arr)[N])
 {
   T ret = 0;

   std::size_t  ui = -1;

   while ( ++ui < N-1U )
      ret <<= CHAR_BIT, ret |= arr[ui];

   while ( ++ui < sizeof(T) )
      ret <<= CHAR_BIT;

   return ret;
 }

Observe that, starting from C++14, this function can be defined constexpr, so you can write something as

constexpr auto fb = rev<std::uint64_t>("foobar");

The following is you code rewritten to use string literals

#include <cstdint>
#include <climits>
#include <iostream>
#include <iomanip>
#include <type_traits>

namespace rev_impl
 {
    template <typename T, std::size_t N,
              typename = typename std::enable_if<(N<=sizeof(T)+1U)>::type>
    T rev (char const (&arr)[N])
     {
       T ret = 0;

       std::size_t  ui = -1;

       while ( ++ui < N-1U )
          ret <<= CHAR_BIT, ret |= arr[ui];

       while ( ++ui < sizeof(T) )
          ret <<= CHAR_BIT;

       return ret;
     }
 }

template <typename T, std::size_t N>
inline uint32_t rev32 (char const (&s)[N])
 { return rev_impl::rev<uint32_t>(s); }

template <typename T, std::size_t N>
inline uint64_t rev64 (char const (&s)[N])
 { return rev_impl::rev<uint64_t>(s); }

template<typename T, std::size_t N>
void print_rev (char const (&s)[N])
 {
   constexpr auto w = sizeof(T) * 2;

   std::cout << std::setw(w) << std::setfill('.') << std::hex
      << rev_impl::rev<T>(s) << '\n';
 }

template <std::size_t N>
inline void print_rev32 (char const (&s)[N])
 { return print_rev<uint32_t>(s); }

template <std::size_t N>
inline void print_rev64 (char const (&s)[N])
 { return print_rev<uint64_t>(s); }

int main ()
 {
   print_rev64("\0\0\0\0\0\0\0a");
   print_rev64("a\0\0\0\0\0\0\0");

   print_rev32("Niki");
   print_rev32("Nika");
   print_rev32("Nikz");
 }
max66
  • 65,235
  • 10
  • 71
  • 111
  • question is will comparison be faster? – Nick Feb 18 '18 at 15:31
  • 1
    @Nick - `std::memcmp()` vs an integer comparison? I suppose it is. The problem I see is the time you need to convert the strings to numbers. If you can use string literals, I suppose you can do it compile time (`constexpr`). – max66 Feb 18 '18 at 15:36
  • @max66: When you want to know which way is faster, try them both and see. – John Zwinck Feb 19 '18 at 12:44