Splitting a string of bytes to vector of BYTES in C++

Question

I have a string of bytes that looks like the following:

"1,3,8,b,e,ff,10"

How would I split this string into an std::vector of BYTEs containing the following values:

[ 0x01, 0x03, 0x08, 0x0b, 0x0e, 0xff, 0x10 ]

I'm trying to split the string using ',' as the delimiter, but I'm having some trouble getting this to work. Can someone give me a helping hand on how to accomplish this?

So I have tried this:

    std::istringstream iss("1 3 8 b e ff 10");
    BYTE num = 0;
    while(iss >> num || !iss.eof()) 
    {
        if(iss.fail()) 
        {
            iss.clear();
            std::string dummy;
            iss >> dummy;
            continue;
        }
        dataValues.push_back(num);
    }

But this pushes the ascii byte values into the vector:

49 //1
51 //3
56 //8
98 //b
101 //e
102 //f
102 //f
49 //1
48 //0

I'm instead trying to fill the vector with:

 0x01
 0x03
 0x08
 0x0b
 0x0e
 0xff
 0x10

You should probably post the relevant part of your non-working code so that people here can help you fix it. — Paul R, Jul 24 '14 at 16:01
Use [`std::istringstream`](http://en.cppreference.com/w/cpp/io/basic_istringstream) in conjunction with the [`std::hex`](http://en.cppreference.com/w/cpp/io/manip/hex) I/O manipulator. Skipping the `,` characters can be done as [shown here](http://stackoverflow.com/a/24520662/1413395). — πάντα ῥεῖ, Jul 24 '14 at 16:02
@πάνταῥεῖ I just tried that but it's not taking the correct values. i edited the post to explain what i mean — user3330644, Jul 24 '14 at 17:00
@user3330644 You've been missing to call `iss >> std::hex;` before the `while` loop, as I mentioned. Alternatively write `while(iss >> std::hex >> num || !iss.eof()) `. Also note that `BYTE` is just a typedef for `unsigned char`, you should input to a `unsigned int` in 1st place. — πάντα ῥεῖ, Jul 24 '14 at 19:30

score 1 · Accepted Answer · edited May 23 '17 at 12:27

You've just been missing to adapt some small issues appearing with your use case for the linked answer from my comment:

    std::istringstream iss("1,3,8,b,e,ff,10");
    std::vector<BYTE> dataValues;

    unsigned int num = 0; // read an unsigned int in 1st place
                          // BYTE is just a typedef for unsigned char
    while(iss >> std::hex >> num || !iss.eof()) {
        if(iss.fail()) {
            iss.clear();
            char dummy;
            iss >> dummy; // use char as dummy if no whitespaces 
                          // may occur as delimiters
            continue;
        }
        if(num <= 0xff) {
            dataValues.push_back(static_cast<BYTE>(num));
        }
        else {
            // Error single byte value expected
        }
    }

You can see the fully working sample here on ideone.

NetVipeC · Answer 2 · 2014-07-24T17:35:23.650

A working sample code (Tested in GCC 4.9.0 with C++11):

The file save.txt contain: 1,3,8,b,e,ff,10 as the first and unique line.

Output:

1
3
8
b
e
ff
10

The idea is:

Use std::getline to read line by line.
Use boost::split to split the line according to the separator.
User std::stringstream to convert from hex string to unsigned char.

Code:

#include <fstream>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/lexical_cast.hpp>

int main(int argc, char* argv[]) {
    std::ifstream ifs("e:\\save.txt");

    std::string line;
    std::vector<std::string> tokens;
    std::getline(ifs, line);
    boost::split(tokens, line, boost::is_any_of(","));

    std::vector<unsigned char> values;
    for (const auto& t : tokens) {
        unsigned int x;
        std::stringstream ss;
        ss << std::hex << t;
        ss >> x;

        values.push_back(x);
    }

    for (auto v : values) {
        std::cout << std::hex << (unsigned long)v << std::endl;
    }

    return 0;
}

odinthenerd · Answer 3 · 2014-07-26T11:29:55.813

Just to demonstrate another, probably much faster, way of doing things consider reading everything into an array and using a custom iterator to do the converting.

class ToHexIterator : public std::iterator<std::input_iterator_tag, int>{
    char* it_;
    char* end_;
    int current_;
    bool isHex(const char c){
        return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f') || (c >= 'A' && c <= 'F');
    }
    char toUpperCase(const char c){
        if (c >= 'a' && c <= 'f'){
            return (c - 'a') + 'A';
        }
        return c;
    }
    int toNibble(const char c){
        auto x = toUpperCase(c);
        if (x >= '0' && x <= '9'){
            return x - '0';
        }
        else {
            return (x - 'A') + 10;
        }
    }
public:
    ToHexIterator() :it_{ nullptr }, end_{ nullptr }, current_{}{}                  //default constructed means end iterator
    ToHexIterator(char* begin, char* end) :it_{ begin }, end_{ end }, current_{}{
        while (!isHex(*it_) && it_ != end_){ ++it_; };  //make sure we are pointing to valid stuff
        ++(*this);
    }
    bool operator==(const ToHexIterator &other){
        return it_ == nullptr && end_ == nullptr && other.it_ == nullptr && other.end_ == nullptr;
    }
    bool operator!=(const ToHexIterator &other){
        return !(*this == other);
    }
    int operator*(){
        return current_;
    }
    ToHexIterator & operator++(){
        current_ = 0;
        if (it_ != end_) {
            while (isHex(*it_) && it_ != end_){
                current_ <<= 4;
                current_ += toNibble(*it_);
                ++it_;
            };
            while (!isHex(*it_) && it_ != end_){ ++it_; };
        }
        else {
            it_ = nullptr;
            end_ = nullptr;
        }
        return *this;
    }
    ToHexIterator operator++(int){
        ToHexIterator temp(*this);
        ++(*this);
        return temp;
    }
};

The basic use case would look like:

char in[] = "1,3,8,b,e,ff,10,--";
std::vector<int> v;
std::copy(ToHexIterator{ std::begin(in), std::end(in) }, ToHexIterator{}, std::back_inserter(v));

Note that it may be faster to use a look up table to do the ascii to hex nibble conversion.

Speed can be VERY dependent on compiler optimization and platform, however because some of the istringstream functions are implemented as virtuals or pointer to functions (depending on the standard library implementation) the optimizer has trouble with them. In my code there are no victuals or function pointers and the only loop is inside the std::copy implementation which the optimizer is used to dealing with. Its also generally faster to loop until two addresses are equal rather than loop until the thing some changing pointer points to is equal to something. At the end of the day its all speculation and voodoo but on MSVC13 on my machine mine is about 10X faster. Here is a live example http://ideone.com/nuwu15 on GCC which is somewhere between 10x and 3x depending on the run and depending on which test goes first (probably because of some caching effects).

All in all there is undoubtedly more room for optimization etc. and anyone who says "mine is always faster" at this level of abstraction is selling snake oil.

Update: using a compile time generated look up table increases speed further: http://ideone.com/ady8GY (note that I increased the size of the input string to decrease noise so this is not directly comparable to the above example)

What actually convinces you, that this code should work faster than the standard stream `operator>>` and hex parsing implementation? — πάντα ῥεῖ, Jul 24 '14 at 21:08
I added some more explanation and a live example measuring the timing of my implementation and an approximation of your implementation (I used ints in both rather than unsigned char but it should not change much) — odinthenerd, Jul 25 '14 at 12:35

Splitting a string of bytes to vector of BYTES in C++

3 Answers3