3

I'm trying to modernize some C++ code, adhering to core guidelines and post ++11 advice. The specific guideline I'm addressing here is to use <algorithm> facilities in lieu of raw loops applying static operations across a sequence with the aim of producing a new sequence.

This first example is illustrates success(as I define it in this context). Two input vectors of std::byte come in , and one comes out, representing the pairwise bitwise XORing of each input vector, leaving the input vectors unmodified. The function in the spirit of this question is std::transform.

vector<byte> XORSmash(const vector<byte>& first, const vector<byte>& second)
{
    if (first.size() != second.size())
        throw std::invalid_argument("XORSMASH: input vectors were not of equal length\n");

    vector<byte> convolution; convolution.reserve(first.size());

    transform(first.cbegin(), first.cend(), second.cbegin(), back_inserter(convolution),
        [](const byte byte1, const byte byte2) {return byte1 ^ byte2;} );

    return convolution;
}

However, there's another function for which I'm having trouble devising a non loop solution that isn't handedly worse than the loop one. This function takes in a string of HexChars(each char of which ultimately conveys 4 bits of value), and generates a vector<byte>, each element of which contains the contents of two HexChars, one in the high 4 bits, one in the low. What the CharToHexByte function does exactly isnt pertinent(I'll include if it becomes necessary), just that it takes a compliant hex character ,and returns an std::byte, with the hex character's numeric value, ie 0-15, loading only 4 bits. The issue is the input string has pairs of hex chars(each a nibble of value), each of which unify into a single hex byte. I cannot use std::transform, to my knowledge, since the input iterators would have to jump by 2 (2 * sizeof(char)//aka container_const_iterator += 2 in this case) each iteration, to extract the next pair of chars in the input string.

TLDR: Is there an algorithmic way to implement the following function w/o the exposed for loop, that isn't handedly more expensive/verbose than the solution below?

vector<byte> UnifyHexNibbles(const string& hexStr)
{
    if (hexStr.size() % 2)
        throw std::invalid_argument("UnfyHxNbl: Input String Indivisible by 8bits. Pad if applicable.\n");

    vector<byte> hexBytes; hexBytes.reserve(hexStr.size() >> 1);
    //can I be eliminated elegantly?
    for (size_t left(0), right(1); right < hexStr.size(); left += 2, right += 2)
        hexBytes.push_back( CharToHexByte(hexStr[left]) << 4 | CharToHexByte(hexStr[right]) );

    return hexBytes;
}
schulmaster
  • 413
  • 5
  • 16
  • You can adapt the iterator to increase by more than one when you increment them: https://stackoverflow.com/questions/5685983/skipping-iterator – NathanOliver Oct 10 '18 at 21:30
  • @NathanOliver I thought of that but going boost, or specializing an iterator, IMO, both classify as more verbose than the one liner loop. If this pattern were pervasive, then the adapt/3rd party solution would be more compelling. I'm looking at this case in a vacuum though. – schulmaster Oct 10 '18 at 21:33
  • 1
    No worries. I just wanted to present the option in case you weren't aware. – NathanOliver Oct 10 '18 at 21:33
  • 1
    Bit operations, hex conversions... looks like a good case for old-good raw loop pal to me. – Mikhail Oct 22 '18 at 17:49

2 Answers2

4

With range-v3, it would be

std::vector<std::byte>
UnifyHexNibbles(const std::string& hexStr)
{
    if (hexStr.size() % 2)
        throw std::invalid_argument("size indivisible by 2.");


    return hexStr
        | ranges::view::chunk(2)
        | ranges::view::transform([](const auto& r)
           {
              return std::byte(CharToHexByte(r[0]) << 4 | CharToHexByte(r[1]));
           });
}

Demo

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • 1
    I feel this is akin to turning left three times just to avoid going right. This still has the vulnerabilities associated with loops, subscripting by a constant predicated upon a size(the chunk view), with an abilty to change one without modifying the other. Somewhat pedantically, its also more text. Lastly, despite range adapters implementing deferred execution, which is great from a minimize-copies/intermediates standpoint, it is hard to believe that this range adapter chain will be close to the perf of the raw loop(based on impression not implementation knowledge). – schulmaster Oct 10 '18 at 23:21
  • On the other hand, this answer increases my confidence that the answer to my original question is simply "no, the specific case is too specific to be handled by any stl algorithm, which are optimized for the general case, consuming a sequence consecutively. – schulmaster Oct 10 '18 at 23:23
  • 1
    *" Somewhat pedantically, its also more text."*. Yes, as I added lot of spacing, but there is less character/symbol than original :-) (and I fully qualified name) (I honestly find range version easier to read). Then I would have wanted a version `chunk<2>` which would allow check for `r[0]`/`r[1]` (with `get<0>(r)`). For performance, compile time should be worse I think, but for runtime, I expect it to be similar (maybe range makes code more difficult to optimize). – Jarod42 Oct 10 '18 at 23:54
  • I see your point about symbol count. However, the `for` loop aligns with a pervasive familiarity; each expression of the three required is well known and conveys a well established evaluation order: `for(scoped init only once; pre loop bool_stop; post_loop actions) `. the blessing and the curse of the construct is user control over all three stages independently. I believe that contingency is responsible for the 'raw-loop-elimination' paradigm. Solutions that don't require a user loop, but allow for the same degree of user error, do not align with that sentiment. – schulmaster Oct 11 '18 at 07:19
  • @schulmaster All the sequential algorithms *should* only advance the iterators. You then write iterator adaptors to do things like this. The ranges proposal just makes it nicer to *output* a sequence (so it can be the input of the next step). – Caleth Oct 11 '18 at 08:55
0

There is no <algorithm> that allows for transformation via non-consecutive input consumption, using non-specialized iterators. Aside from specializing an iterator, there are third-party, and (hopefully) soon to be standard alternatives/enhancements to present core STL, such as ranges(ranges repository). See User @Jarod42's answer for a working example with ranges.

schulmaster
  • 413
  • 5
  • 16