1

I am using 2 x std::uint64_t and 1 x std::uint32_t in a high performance implementation of of operator<=> in a struct conataining a std::array<std::byte, 20>.

I am trying to make it cross compiler and architecture compatible.

As part of that I am trying to outright reject any architecture with std::endian::native which is not std::endian::little or std::endian::big.

I think I am running foul of the "static_assert must depend on a template parameter rule", as the struct and the member function are not templated.

  std::strong_ordering operator<=>(const pawned_pw& rhs) const {

    static_assert(sizeof(std::uint64_t) == 8);
    static_assert(sizeof(std::uint32_t) == 4);

    if constexpr (std::endian::native == std::endian::little) {

      // c++23 will have std::byteswap, so we won't need this
#ifdef _MSC_VER
#define BYTE_SWAP_32 _byteswap_ulong
#define BYTE_SWAP_64 _byteswap_uint64
#else
#define BYTE_SWAP_32 __builtin_bswap32
#define BYTE_SWAP_64 __builtin_bswap64
#endif

      // this compiles to a load and `bswap` which should be fast
      // measured > 33% faster than hash < rhs.hash, which compiles to `memcmp`
      std::uint64_t head     = BYTE_SWAP_64(*(std::uint64_t*)(&hash[0]));     // NOLINT
      std::uint64_t rhs_head = BYTE_SWAP_64(*(std::uint64_t*)(&rhs.hash[0])); // NOLINT
      if (head != rhs_head) return head <=> rhs_head;

      std::uint64_t mid     = BYTE_SWAP_64(*(std::uint64_t*)(&hash[8]));     // NOLINT
      std::uint64_t rhs_mid = BYTE_SWAP_64(*(std::uint64_t*)(&rhs.hash[8])); // NOLINT
      if (mid != rhs_mid) return mid <=> rhs_mid;

      std::uint32_t tail     = BYTE_SWAP_32(*(std::uint32_t*)(&hash[16]));     // NOLINT
      std::uint32_t rhs_tail = BYTE_SWAP_32(*(std::uint32_t*)(&rhs.hash[16])); // NOLINT
      return tail <=> rhs_tail;
    } else if constexpr (std::endian::native == std::endian::big) {
      // can use big_endian directly
      std::uint64_t head     = *(std::uint64_t*)(&hash[0]);     // NOLINT
      std::uint64_t rhs_head = *(std::uint64_t*)(&rhs.hash[0]); // NOLINT
      if (head != rhs_head) return head <=> rhs_head;

      std::uint64_t mid     = *(std::uint64_t*)(&hash[8]);     // NOLINT
      std::uint64_t rhs_mid = *(std::uint64_t*)(&rhs.hash[8]); // NOLINT
      if (mid != rhs_mid) return mid <=> rhs_mid;

      std::uint32_t tail     = *(std::uint32_t*)(&hash[16]);     // NOLINT
      std::uint32_t rhs_tail = *(std::uint32_t*)(&rhs.hash[16]); // NOLINT
      return tail <=> rhs_tail;
    } else {
      static_assert(std::endian::native != std::endian::big &&
                    std::endian::native != std::endian::little,
                    "mixed-endianess architectures are not supported");
    }
  }


I guess I could just fall back instead of static_assert

    } else {
      // fall back to the slow way
      hash <=> rhs.hash;
    }

Oliver Schönrock
  • 1,038
  • 6
  • 11
  • ugly ugly C casts... those are not compiler checked. C++ has at least `reinterpret_cast`, but even better would be C++20's `std::bit_cast` – JHBonarius Mar 10 '22 at 11:24
  • @MarekR I do understand what it is. I was also surprise to find a "mixed endiancess" here: https://en.cppreference.com/w/cpp/types/endian – Oliver Schönrock Mar 10 '22 at 11:33
  • @JHBonarius I don't think reinterpret_cast can do that... it would need to be const_cast(rein_cast)... and at that point... – Oliver Schönrock Mar 10 '22 at 11:35
  • @JHBonarius but `std::bit_cast` seems to work.. wasn't aware of that yet. Thanks – Oliver Schönrock Mar 10 '22 at 11:40
  • @JHBonarius Actually maybe `std::bit_cast` doesn't work after all? "This overload participates in overload resolution only if sizeof(To) == sizeof(From) and both To and From are TriviallyCopyable types. " https://en.cppreference.com/w/cpp/numeric/bit_cast... Here I am casting for a sequence of 8 `std::bytes` to a `std::uint64_t` ... can it do that? How? – Oliver Schönrock Mar 10 '22 at 11:44
  • @JHBonarius reincast can be made to work. I added a cleaner version in a self answer below with a little abstraction as well – Oliver Schönrock Mar 10 '22 at 14:36

2 Answers2

3

I suggest asserting that it's either big or little endian:

#include <bit>
#include <compare>

struct pawned_pw {
    std::strong_ordering operator<=>(const pawned_pw& rhs) const {
        static_assert(std::endian::native == std::endian::big ||
                          std::endian::native == std::endian::little,
                      "mixed-endianess architectures are not supported");

        if constexpr (std::endian::native == std::endian::little) {
            return ...;
        } else {
            // big
            return ...;
        }
    }
};
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • 1
    That's good. Thanks. Accepted. Moving outside the constexpr-if... – Oliver Schönrock Mar 10 '22 at 11:47
  • Is there a better way to do that horrible compiler detect macro juggling fro the `swap`? Or the horrible casting -- as someone else pointed out? I have tried but not found better ways, given this needs to compile down to the minimal machine code. – Oliver Schönrock Mar 10 '22 at 11:52
  • 1
    @OliverSchönrock I'm not sure if there's a better way than to do a feature check (`__cpp_lib_byteswap`) and then to fall back on your `#ifdefs`. When it comes to swapping, perhaps an option could be to swap the whole struct at once? I'm not sure how effective `std::reverse(reinterpret_cast(&struct_instance), reinterpret_cast(&struct_instance) + sizeof struct_instance);` would be. Some experimenting required :-) – Ted Lyngmo Mar 10 '22 at 12:40
  • 1
    OK, thanks. I have inspected the assembly generated by the c-casts and `_buitin_bswap` and they compile down to a single `bswap` instruction. So I suspect `std::reverse` won't beat that. Also, I only need the 1st 8 bytes to return an answer in many many cases, so that's efficient too. Good to know I am not missing something obvious. The feature test is good, make it future proof. – Oliver Schönrock Mar 10 '22 at 12:44
  • 1
    I added a cleaner version below – Oliver Schönrock Mar 10 '22 at 14:35
1

A tidied up version, based on the feedback, which also adds a touch of abstraction:

#ifdef __cpp_lib_byteswap
using std::byteswap;
#else
template <class T>
constexpr T byteswap(T n) noexcept {
// clang-format off
  // NOLINTBEGIN
  #ifdef _MSC_VER
    #define BYTE_SWAP_16 _byteswap_ushort
    #define BYTE_SWAP_32 _byteswap_ulong
    #define BYTE_SWAP_64 _byteswap_uint64
  #else
    #define BYTE_SWAP_16 __builtin_bswap16
    #define BYTE_SWAP_32 __builtin_bswap32
    #define BYTE_SWAP_64 __builtin_bswap64
  #endif
  // NOLINTEND
  // clang-format on

  if constexpr (std::same_as<T, std::uint64_t>) {
    return BYTE_SWAP_64(n);
  } else if constexpr (std::same_as<T, std::uint32_t>) {
    return BYTE_SWAP_32(n);
  } else if constexpr (std::same_as<T, std::uint16_t>) {
    return BYTE_SWAP_16(n);
  }
}
#endif

template <typename T, typename... U>
concept any_of = (std::same_as<T, U> || ...);

// convert the sizeof(Target) bytes starting at `source` pointer to Target
// uses compiler intrinsics for endianess conversion if required and if `swap` == true
// caller responsibility to ensure that enough bytes are readable/dereferencable etc
// this compiles to a load and `bswap` which is very fast and can beat eg `memcmp`
template <typename Target, bool swap_if_required = true>
Target bytearray_cast(
    const std::byte* source) requires any_of<Target, std::uint64_t, std::uint32_t, std::uint16_t> {

  static_assert(std::endian::native == std::endian::big ||
                    std::endian::native == std::endian::little,
                "mixed-endianess architectures are not supported");

  Target value = *reinterpret_cast<const Target*>(source); // NOLINT

  if constexpr (swap_if_required && std::endian::native == std::endian::little) {
    return byteswap<Target>(value);
  } else {
    return value;
  }
}

template <typename T>
std::strong_ordering
three_way(const std::byte* a,
          const std::byte* b) requires any_of<T, std::uint64_t, std::uint32_t, std::uint16_t> {
  return bytearray_cast<T>(a) <=> bytearray_cast<T>(b);
}

template <typename T>
bool equal(const std::byte* a,
           const std::byte* b) requires any_of<T, std::uint64_t, std::uint32_t, std::uint16_t> {
  // don't bother swapping for endianess, since we don't need it
  return bytearray_cast<T, false>(a) == bytearray_cast<T, false>(b);
}

struct pawned_pw {

  std::strong_ordering operator<=>(const pawned_pw& rhs) const {
    if (auto cmp = three_way<std::uint64_t>(&hash[0], &rhs.hash[0]);
        cmp != std::strong_ordering::equal)
      return cmp;

    if (auto cmp = three_way<std::uint64_t>(&hash[8], &rhs.hash[8]);
        cmp != std::strong_ordering::equal)
      return cmp;

    return three_way<std::uint32_t>(&hash[16], &rhs.hash[16]);
  }

  bool operator==(const pawned_pw& rhs) const {
    if (bool cmp = equal<std::uint64_t>(&hash[0], &rhs.hash[0]); !cmp) return cmp;
    if (bool cmp = equal<std::uint64_t>(&hash[8], &rhs.hash[8]); !cmp) return cmp;
    return equal<std::uint32_t>(&hash[16], &rhs.hash[16]);
  }

  std::array<std::byte, 20> hash;
};


Oliver Schönrock
  • 1,038
  • 6
  • 11