-2

I'm having trouble when I want to read binary file into bitset and process it.

std::ifstream is("data.txt", std::ifstream::binary);
if (is) {
    // get length of file:
    is.seekg(0, is.end);
    int length = is.tellg();
    is.seekg(0, is.beg);

    char *buffer = new char[length];

    is.read(buffer, length);
    is.close();

    const int k = sizeof(buffer) * 8;
    std::bitset<k> tmp;
    memcpy(&tmp, buffer, sizeof(buffer));

    std::cout << tmp;

    delete[] buffer;
}
int a = 5;
std::bitset<32> bit;
memcpy(&bit, &a, sizeof(a));
std::cout << bit;

I want to get {05 00 00 00} (hex memory view), bitset[0~31]={00000101 00000000 00000000 00000000} but I get bitset[0~31]={10100000 00000000 00000000 00000000}

tom
  • 1
  • 1
  • why do you expect to get that? – user253751 Sep 09 '22 at 13:20
  • 5
    Your code has undefined behavior. `memcpy` requires trivial copyable types, which `bitset` isn't. – NathanOliver Sep 09 '22 at 13:24
  • 2
    Also, `buffer` is a pointer. `sizeof(buffer)` is the size of a pointer. – Drew Dormann Sep 09 '22 at 13:25
  • `sizeof(buffer) * 8;` is the bitcount of a *pointer* ; not what it points to. I suspect that wasn't your goal here. Between that and the hard-memory copy into a `std::bitset` object, there's all sort of 'wrong' going on in this code. – WhozCraig Sep 09 '22 at 13:26
  • `const int k = sizeof(buffer) * 8;` won't do what you expect. It is a constant expression equal to 8 times the size of a pointer. It isn't possible to have a `std::bitset` whose size is based on runtime information. – François Andrieux Sep 09 '22 at 13:26
  • bits are traditionally numbered right-to-left – user253751 Sep 09 '22 at 13:33
  • Aside from the oddity of a file with the extension ".txt" being read as binary, where did this data come from? The only **portable** use of binary files is to read data that was written by the same application. – Pete Becker Sep 09 '22 at 14:23
  • Minor point about portability: instead of multiplying the number of bytes by 8, use `CHAR_BIT`. You might, some day, run into a system that uses something other than 8 bits. – Pete Becker Sep 09 '22 at 14:25

1 Answers1

-1

You need to learn how to crawl before you can crawl on broken glass.

In short, computer memory is an opaque box, and you should stop making assumptions about it.

Hyrum's law is the stupidest thing that has ever existed and if you stopped proliferating this cancer, that would be great.

What I'm about to write is common sense to every single competent C++ programmer out there. As trivial as breathing, and as important as breathing. It should be included in every single copy of C++ book ever, hammered into the heads of new programmers as soon as possible, but for some undefined reason, isn't.

The only things you can rely on when it comes to what I'm going to loosely define as "memory" is bits of a byte never being out of order. std::byte is such a type, and before it was added to the standard, we used unsigned char, they are more or less interchangeable, but you should prefer std::byte whenever you can.

So, what do I mean by this?

std::byte a = 0b10101000;
assert(((a >> 3) & 1) == 1); // always true 

That's it, everything else is up to the compiler, your machine architecture and stars in the sky.

Oh, what, you thought you can just write int a = 0b1010100000000010; and expect something good? I'm sorry, but that's just not how things work in these savage lands. If you expect any order here, you will have to split it into bytes yourself, you cannot just cast this into std::byte bytes[2] and expect bytes[0] == 0b10101000. It is NEVER correct to assume anything here, if you do, one day your code will break, and by the time you realize that it's broken it will be too late, because it will be yet another undebuggable 30 million line of code legacy codebase half of which is only available in proprietary shared objects that we didn't have source code of since 1997. Goodluck.

So, what's the correct way? Luckily for us, binary shifts are architecture independent. int is guaranteed to be no smaller than 2 bytes, so that's the only thing this example relies on, but most machines have sizeof (int) == 4. If you needed more bytes, or exact number of bytes, you should be using appropriate type from Fixed width integer types.

int a = 0b1010100000000010;
std::byte bytes[2]; // always correct
// std::byte bytes[4]; // stupid assumption by inexperienced programmers
// std::byte bytes[sizeof (a)]; // flexible solution that needs more work
// we think in terms of 8 bits, don't care about the rest
bytes[0] = a & 0xFF;
// we need to skip possibly more than 8 bits to access next 8 bits however
bytes[1] = (a >> CHAR_BIT) & 0xFF; 

This is the only absolutely correct way to convert sizeof (T) > 1 into array of bytes and if you see anything else then it's without a doubt subpar implementation that will stop working the moment you change a compiler and/or machine architecture.

The reverse is true too, you need to use binary shifts to convert a byte array to a size bigger than 1 byte.

On top of that, this only applies to primitive types. int, long, short... Sometimes you can rely on it working correctly with float or double as long as you always need IEEE 754 and will never need a machine so old or bizarre that it doesn't support IEEE 754. That's it.

If you think really long and hard, you may realize that this is no different from structs.

struct x {
    int a;
    int b;
};

What can we rely on? Well, we know that x will have address of a. That's it. If we want to set b, we need to access it by x.b, every other assumption is ALWAYS wrong with no ifs or buts. The only exception is if you wrote your own compiler and you are using your own compiler, but you're ignoring the standard and at that point anything is possible; that's fine, but it's not C++ anymore.

So, what can we infer from what we know now? Array of bytes cannot be just memcpy'd into a std::bitset. You don't know its implementation and you cannot know its implementation, it may change tomorrow and if your code breaks because of that then it's wrong and you're a failure of a programmer.

Want to convert an array of bytes to a bitset? Then go ahead and iterate over every single bit in the byte array and set each bit of the bitset however you need it to be, that's the only correct and sane solution. Every other solution is objectively wrong, now, and forever. Until someone decides to say otherwise in C++ standard. Let's just hope that it will never happen.

yotsugi
  • 68
  • 1
  • 5