5

My case is as follows:

  1. I have a binary file I'm reading from with std::fstream read operation as (char*)
  2. My goal is to take every byte from the file, hex formatted, and then append it to a string variable
  3. The string variable should hold the entire content of the file formatted as per item 2.

For example, let's say I have the following binary file content:

D0 46 98 57 A0 24 99 56 A3

The way I'm formatting each byte is as follows:

stringstream fin;;

for (size_t i = 0; i < fileb_size; ++i)
{
fin << hex << setfill('0') << setw(2) << static_cast<uint16_t>(fileb[i]);
}

// this would yield the output "D0469857A0249956A3"

return fin.str();

Above approach works as expected, however, it is extremely slow for large files, which I understand; stringstream is meant for input formatting!

My question is, are there ways to optimize such code or the approach I'm taking all together? My only constrain is that the output should be in string format as shown above.

Thank you.

Xigma
  • 177
  • 1
  • 11
  • 3
    iostreams is not designed for efficiency. For specific use cases, you can't beat native implementation-specific file I/O. Or, in the worse case, take C's `FILE *`, and roll your own, hand-optimized hex conversion code. Couldn't beat that for efficiency. It's not rocket science. At least back in the days when I learned C and C++, this was a typical homework assignment: read hexadecimal, and convert it. – Sam Varshavchik Dec 30 '17 at 01:20
  • 1
    “Extremely slow” compared to what? – Pete Becker Dec 30 '17 at 01:21
  • 1
    did you try [reserving size](https://stackoverflow.com/q/1941064/995714) for the output string? You already know the size of the input so it's easy to calculate the size of the output and avoid to reallocate the array many times – phuclv Dec 30 '17 at 01:27
  • @LưuVĩnhPhúc I tried that, but no improvement! – Xigma Dec 30 '17 at 01:30
  • @PeteBecker not an apple-to-apple comparison, but to a non stringstream based approach, if I don't have to store it in a string object. – Xigma Dec 30 '17 at 01:33
  • @Xigma -- I agree with Sam V. If you want speed, use your OS functions provided to you for I/O management, or get a C++ based I/O library geared towards speed for the OS it runs on. – PaulMcKenzie Dec 30 '17 at 01:36
  • 1
    `std::stringstream` is rather slow, I recommend hand crafting the hex conversion and appending to a preallocated `std::string` – Galik Dec 30 '17 at 01:53
  • Full credit for doing the right thing: Starting with the stupidest, simplest, solution that could work. Too slow? Well, life's like that sometimes, but still, no sense starting by doing it the hard way. – user4581301 Dec 30 '17 at 02:08
  • one classical solution is to trade space for speed ... use a string filled with 256 of 2-char-hex-values, use (2*fileb[i]) as index into this string/table. – 2785528 Dec 30 '17 at 02:15

1 Answers1

3

std::stringstream is rather slow. It won't preallocate and it always involves copying the string, at least once to retrieve it. Also the conversion to hex could be hand coded to be faster.

I think something like this might be more performant:

// Quick and dirty
char to_hex(unsigned char nibble)
{
    assert(nibble < 16);

    if(nibble < 10)
        return char('0' + nibble);

    return char('A' + nibble - 10);
}

std::string to_hex(std::string const& filename)
{
    // open file at end
    std::ifstream ifs(filename, std::ios::binary|std::ios::ate);

    // calculate file size and move to beginning
    auto end = ifs.tellg();
    ifs.seekg(0, std::ios::beg);
    auto beg = ifs.tellg();

    // preallocate the string
    std::string out;
    out.reserve((end - beg) * 2);

    char buf[2048]; // larger = faster (within limits)

    while(ifs.read(buf, sizeof(buf)) || ifs.gcount())
    {
        for(std::streamsize i = 0; i < ifs.gcount(); ++i)
        {
            out += to_hex(static_cast<unsigned char>(buf[i]) >> 4); // top nibble
            out += to_hex(static_cast<unsigned char>(buf[i]) & 0b1111); // bottom nibble
        }
    }

    return out;
}

It appends to a pre-allocated string to minimize copying and avoid reallocations.

Galik
  • 47,303
  • 4
  • 80
  • 117
  • Thanks @Galik for the detailed solution. What a remarkable performance improvement with this approach. – Xigma Dec 30 '17 at 02:44