15

I have a bunch of test vectors, presented in the form of hexadecimal strings:

MSG: 6BC1BEE22E409F96E93D7E117393172A
MAC: 070A16B46B4D4144F79BDD9DD04A287C
MSG: 6BC1BEE22E409F96E93D7E117393172AAE2D8A57
MAC: 7D85449EA6EA19C823A7BF78837DFADE

etc. I need to get these into a C++ program somehow, without too much editing required. There are various options:

  • Edit the test vectors by hand into the form 0x6B,0xC1,0xBE,...
  • Edit the test vectors by hand into the form "6BC1BEE22E409F96E93D7E117393172A" and write a function to convert that into a byte array at run time.
  • Write a program to parse the test vectors and output C++ code.

But the one I ended up using was:

  • User-defined literals,

because fun. I defined a helper class HexByteArray and a user-defined literal operator HexByteArray operator "" _$ (const char* s) that parses a string of the form "0xXX...XX", where XX...XX is an even number of hex digits. HexByteArray includes conversion operators to const uint8_t* and std::vector<uint8_t>. So now I can write e.g.

struct {
  std::vector<uint8_t> MSG ;
  uint8_t* MAC ;
  } Test1 = {
  0x6BC1BEE22E409F96E93D7E117393172A_$,
  0x070A16B46B4D4144F79BDD9DD04A287C_$
  } ;

Which works nicely. But now here is my question: Can I do this for arrays as well? For instance:

uint8_t MAC[16] = 0x070A16B46B4D4144F79BDD9DD04A287C_$ ;

or even

uint8_t MAC[] = 0x070A16B46B4D4144F79BDD9DD04A287C_$ ;

I can't see how to make this work. To initialise an array, I would seem to need an std::initializer_list. But as far as I can tell, only the compiler can instantiate such a thing. Any ideas?


Here is my code:

HexByteArray.h

#include <cstdint>
#include <vector>

class HexByteArray
  {
public:
  HexByteArray (const char* s) ;
  ~HexByteArray() { delete[] a ; }

  operator const uint8_t*() && { const uint8_t* t = a ; a = 0 ; return t ; }
  operator std::vector<uint8_t>() &&
    {
    std::vector<uint8_t> v ( a, a + len ) ;
    a = 0 ;
    return v ;
    }

  class ErrorInvalidPrefix { } ;
  class ErrorHexDigit { } ;
  class ErrorOddLength { } ;

private:
  const uint8_t* a = 0 ;
  size_t len ;
  } ;

inline HexByteArray operator "" _$ (const char* s)
  {
  return HexByteArray (s) ;
  }

HexByteArray.cpp

#include "HexByteArray.h"

#include <cctype>
#include <cstring>

HexByteArray::HexByteArray (const char* s)
  {
  if (s[0] != '0' || toupper (s[1]) != 'X') throw ErrorInvalidPrefix() ;
  s += 2 ;

  // Special case: 0x0_$ is an empty array (because 0x_$ is invalid C++ syntax)
  if (!strcmp (s, "0"))
    {
    a = nullptr ; len = 0 ;
    }
  else
    {
    for (len = 0 ; s[len] ; len++) if (!isxdigit (s[len])) throw ErrorHexDigit() ;
    if (len & 1) throw ErrorOddLength() ;
    len /= 2 ;
    uint8_t* t = new uint8_t[len] ;
    for (size_t i = 0 ; i < len ; i++, s += 2)
      sscanf (s, "%2hhx", &t[i]) ;
    a = t ;
    }
  }
TonyK
  • 16,761
  • 4
  • 37
  • 72
  • 1
    What about `std::array`? IIRC `template double operator "" _x();` style operators can give you the compile time size you'd need. – StoryTeller - Unslander Monica Dec 24 '18 at 13:08
  • 3
    Why not `std::array`? You could even have your literal return the desired array directly then, as `auto MAC = 0x070A16B46B4D4144F79BDD9DD04A287C_$ ;`. – Baum mit Augen Dec 24 '18 at 13:09
  • @StoryTeller, BaummitAugen: I couldn't make this work. But if you can post an answer with code, I will be very grateful! – TonyK Dec 24 '18 at 13:12
  • 1
    Sorry, no time to flesh out a complete answer right now, but here's the idea: https://wandbox.org/permlink/xPh9KJouD3WogzI7 The implementation of the operator is a stub, of course. – Baum mit Augen Dec 24 '18 at 13:25
  • 1
    Downvoters out in full force again today I see. I think this is both useful and interesting. – Lightness Races in Orbit Dec 24 '18 at 13:27
  • @LightnessRacesinOrbit I downvoted because this is a regexp question... `sed` FTW! – YSC Dec 24 '18 at 13:32
  • 4
    @YSC This question has absolutely nothing to do with regular expressions. Even if it did, why would that mean a downvote? – Lightness Races in Orbit Dec 24 '18 at 13:33
  • 1
    @LightnessRacesinOrbit I find it useless. OP is trying to find a clever solution to a non-problem. Just make `sed` crunch that data to produce a simple C source file. Compile, link and enjoy your data. – YSC Dec 24 '18 at 13:35
  • 3
    @YSC Nobody's forcing you to do it this way Season's greetings! – Lightness Races in Orbit Dec 24 '18 at 13:37
  • @LightnessRacesinOrbit I know ^^ The question _could_ be interesting though, but this non-problem is an unwanted context. If the Q were _"I'm trying to define a UDL which produces an array so I can write `auto data = "DEADBEEF"_$;` but ... How can I solve that?"_, I would have upvoted it. – YSC Dec 24 '18 at 13:43
  • 5
    That's exactly what the question is – Lightness Races in Orbit Dec 24 '18 at 13:44
  • @BaummitAugen: That code is just the same as `char a[4] = "1234"`, isn't it? I need to parse the hexadecimal input string of length `2n` into a byte array of length `n`. – TonyK Dec 24 '18 at 14:08
  • @BaummitAugen: OK, I think I see how to do this. But if I return an array, I can't say `uint8_t* s = 0x1234_$` any more... I could solve this by having two different user-defined literals, I suppose: `const uint8_t* s = 0x1234_p$` for pointers and vectors, and `auto a = 0x1234_a$` for arrays. Thanks for your help! – TonyK Dec 24 '18 at 14:50
  • @TonyK - I must add that I also started on a template meta-programming solution to do this. ([obligatory link](http://coliru.stacked-crooked.com/a/9fad473cedd081ea)). So far I only added the check that a literal is valid. Step 2 was to turn the character sequence into a shorter sequence of pure bytes, and shove'em in an array. Sadly, duty calls. – StoryTeller - Unslander Monica Dec 24 '18 at 15:04
  • 3
    @YSC Don't downvote just because you don't happen to like the OPs approach. Post a comment or post an answer suggesting a different approach and explain why. The voting button isn't about how you like the question, it is about whether the question is well presented, clear and shows research effort. – Galik Dec 24 '18 at 16:21
  • Which standard can you use? Can you use C++17? C++14? – Justin Dec 24 '18 at 18:04
  • 1
    @Justin: I am currently compiling for C++17. – TonyK Dec 24 '18 at 18:08

3 Answers3

7

Use a numeric literal operator template, with the signature:

template <char...>
result_type operator "" _x();

Also, since the data is known at compile-time, we might as well make everything constexpr. Note that we use std::array instead of C-style arrays:

#include <cstdint>
#include <array>
#include <vector>

// Constexpr hex parsing algorithm follows:
struct InvalidHexDigit {};
struct InvalidPrefix {};
struct OddLength {};

constexpr std::uint8_t hex_value(char c)
{
    if ('0' <= c && c <= '9') return c - '0';
    // This assumes ASCII:
    if ('A' <= c && c <= 'F') return c - 'A' + 10;
    if ('a' <= c && c <= 'f') return c - 'a' + 10;
    // In constexpr-land, this is a compile-time error if execution reaches it:
    // The weird `if (c == c)` is to work around gcc 8.2 erroring out here even though
    // execution doesn't reach it.
    if (c == c) throw InvalidHexDigit{};
}

constexpr std::uint8_t parse_single(char a, char b)
{
    return (hex_value(a) << 4) | hex_value(b);
}

template <typename Iter, typename Out>
constexpr auto parse_hex(Iter begin, Iter end, Out out)
{
    if (end - begin <= 2) throw InvalidPrefix{};
    if (begin[0] != '0' || begin[1] != 'x') throw InvalidPrefix{};
    if ((end - begin) % 2 != 0) throw OddLength{};

    begin += 2;

    while (begin != end)
    {
        *out = parse_single(*begin, *(begin + 1));
        begin += 2;
        ++out;
    }

    return out;
}

// Make this a template to defer evaluation until later        
template <char... cs>
struct HexByteArray {
    static constexpr auto to_array()
    {
        constexpr std::array<char, sizeof...(cs)> data{cs...};

        std::array<std::uint8_t, (sizeof...(cs) / 2 - 1)> result{};

        parse_hex(data.begin(), data.end(), result.begin());

        return result;
    }

    constexpr operator std::array<std::uint8_t, (sizeof...(cs) / 2)>() const 
    {
        return to_array();
    }

    operator std::vector<std::uint8_t>() const
    {
        constexpr auto tmp = to_array();

        return std::vector<std::uint8_t>{tmp.begin(), tmp.end()};
    }
};

template <char... cs>
constexpr auto operator"" _$()
{
    static_assert(sizeof...(cs) % 2 == 0, "Must be an even number of chars");
    return HexByteArray<cs...>{};
}

Demo

Example usage:

auto data_array = 0x6BC1BEE22E409F96E93D7E117393172A_$ .to_array();
std::vector<std::uint8_t> data_vector = 0x6BC1BEE22E409F96E93D7E117393172A_$;

As a side note, $ in an identifier is actually a gcc extension, so it's non-standard C++. Consider using a UDL other than _$.

TonyK
  • 16,761
  • 4
  • 37
  • 72
Justin
  • 24,288
  • 12
  • 92
  • 142
  • This is just what I was looking for. Thank you! I am stuck with gcc version 5.3 for now, which can't handle all those `constexpr` functions. But I'm sure I can get it to work. About that `$`: yes, you are right, although I find it strange that `GLAGOLITIC CAPITAL LETTER AZU` is allowed in identifiers but `$` is not. But I'm going to keep it, because it mirrors my Basic compiler's syntax `Public S$ = &H12AB34CD56$`. – TonyK Dec 24 '18 at 23:12
  • @TonyK With gcc 5.3, I believe it should work if you simply remove all of the `constexpr`s. Probably want to do a bit of thinking to ensure there aren't too many copies – Justin Dec 24 '18 at 23:18
6

This will make it

namespace detail{
template <std::size_t C> constexpr std::integral_constant<std::size_t, C> int_c{ };

template <char c>
class hex_decimal_t
{
    constexpr static std::uint8_t get_value() {
        constexpr std::uint8_t k = c - '0';
        if constexpr (k >= 0 && k <= 9) { return k; }
        else if constexpr (k >= 17 && k <= 22) { return k - 7;  }
        else if constexpr (k >= 49 && k <= 54) { return k - 39; }
        else { return std::uint8_t(-1); }
    }
public:
    static constexpr std::uint8_t value = get_value();
    constexpr operator auto() const{
        return value;
    }
};
template <char C> constexpr hex_decimal_t<C> hex_decimal{ };

template <bool B> using bool_type = std::integral_constant<bool, B>;

template <char... cs> struct is_valid_hex : std::false_type { };
template <char... cs> struct is_valid_hex<'0', 'x', cs...> : bool_type<((hex_decimal<cs> != std::uint8_t(-1)) && ...)>{};
template <char... cs> struct is_valid_hex<'0', 'X', cs...> : bool_type<((hex_decimal<cs> != std::uint8_t(-1)) && ...)>{};

template <std::size_t... Is>
constexpr auto expand_over(std::index_sequence<0, Is...>)
{
    return [](auto&& f) -> decltype(auto) {
        return decltype(f)(f)(int_c<Is>...);
    };
}

template <class T,class... F>
constexpr auto select(T, F&&... f) {
    return std::get<T{}>(std::forward_as_tuple(std::forward<F>(f)...));
}
}

template <char... ds>
constexpr auto operator "" _H()
{
    static_assert(detail::is_valid_hex<ds...>{} || sizeof...(ds) < 3, "Not a valid hex number");
    static_assert(!(sizeof...(ds) > 3 && sizeof...(ds) & 0x1), "Hex string must have even length");

    constexpr int Sz = sizeof...(ds);

    constexpr auto expand = detail::select(detail::int_c<(Sz > 3)>,
        [] { return detail::expand_over(std::make_index_sequence<2>{}); },
        [] { return detail::expand_over(std::make_index_sequence<Sz/2>{}); }
    )();

    if constexpr (Sz <= 3) {
        return expand([](auto... Is) {
            constexpr std::array digs{ds...};
            return std::array { (detail::hex_decimal<digs[2 * Is]>)... };
        });
    } else {
        return expand([](auto... Is) {
            constexpr std::array digs{ds...};
            return std::array { ((detail::hex_decimal<digs[2 * Is]> << 4) | detail::hex_decimal<digs[2 * Is + 1]>)... };
        });
    }
}

constexpr auto arr = 0x070A16B46B4D4144F79BDD9DD04A287C_H;
static_assert(arr.size() == 16);
static_assert(std::get<0>(arr) == 0x7);
static_assert(std::get<arr.size() - 1>(arr) == 0x7C);

Live demo

Jans
  • 11,064
  • 3
  • 37
  • 45
  • Thank you! Unfortunately I can only accept one answer, otherwise I would accept this too. – TonyK Dec 24 '18 at 23:31
1

A completely compile-time static_assert version based on @Justin's answer.

The udl operator returns std::array directly. You can simply define other udl operators that return std::tuple<std::integral_constant<char, c>...> or even vector using the HexArrayBuilder implementation class.

This is c++11 version. The static_assert that checks character validity can be written in c++17 way like the commented line.

#include <array>
#include <type_traits>
#include <tuple>

struct HexArrayHelper {
    static constexpr bool valid(char c) { return ('0' <= c && c <= '9') || ('A' <= c && c <= 'F') || ('a' <= c && c <= 'f'); }
    static constexpr char hex_value(char c) {
        return ('0' <= c && c <= '9') ? c - '0'
             : ('A' <= c && c <= 'F') ? c - 'A' + 10
             : c - 'a' + 10;
    }
    static constexpr char build(char a, char b) {
        return (hex_value(a) << 4) + hex_value(b);
    };
};

template <char... cs>
struct HexArray {
    static constexpr std::array<char, sizeof...(cs)> to_array() { return {cs...}; }
    static constexpr std::tuple<std::integral_constant<char, cs>...> to_tuple() { return {}; }
};

template <typename T, char... cs>
struct HexArrayBuilder : T {};

template <char... built, char a, char b, char... cs>
struct HexArrayBuilder<HexArray<built...>, a, b, cs...> : HexArrayBuilder<HexArray<built..., HexArrayHelper::build(a, b)>, cs...> {
    static_assert(HexArrayHelper::valid(a) && HexArrayHelper::valid(b), "Invalid hex character");
};

 template <char zero, char x, char... cs>
struct HexByteArray : HexArrayBuilder<HexArray<>, cs...> {
    static_assert(zero == '0' && (x == 'x' || x == 'X'), "Invalid prefix");
    // static_assert(std::conjunction<std::bool_constant<HexArrayHelper::valid(cs)>...>::value, "Invalid hex character");
};

template <char... cs>
constexpr auto operator"" _hexarr() -> std::array<char, sizeof...(cs) / 2 - 1> {
    static_assert(sizeof...(cs) % 2 == 0 && sizeof...(cs) >= 2, "Must be an even number of chars");
    return HexByteArray<cs...>::to_array();
}

auto x = 0X1102030405060708abcdef_hexarr;
YumeYao
  • 557
  • 4
  • 10