1

I am following through a video-tutorial on serialization. In the video, the author has created an std::vector<int8_t> to which he is writing values of variables of integral types byte by byte.

So if we take an int32_t variable, it will take 4 elements of that vector. The way he is doing it is by shifting the binary value of this variable by a corresponding number of bits. I understand why this is done. If we take the value 76745, for example, the vector values will look like {0, 1, 43, -55}.

Here is the method with a little simplification from my side:

template <typename T>
void encode(std::vector<int8_t> *buffer, int16_t *iterator, T value)
{
    for (size_t i = 0; i < sizeof(T); ++i)
    {
        (*buffer)[(*iterator)++] = value >> ((sizeof(T) * 8 - 8) - i*8); 
    }
}

Now, my concern is, that he claims we need to do this in order to convert the number to little endian. I don't think it's the case, because from what I understand, endianness is how numbers are represented in memory. When we are operating on a value with, e.g., bit-shifts, we are operating on the binary value, so it has nothing to do with endianness, just like I read here.

He proceeds to give a piece of code to test, as he claims, whether the machine is little-endian:

bool IsLittleEndian() // weirdest
{
    int8_t a = 5;
    std::string res = std::bitset<8>(a).to_string();
    std::cout << res << '\n';
    return (res.back() == '1');
}

This again makes me doubt, because from what I read, bitset is not dependent on endianness. (Additionally, if the last bit of 5 is 1, then it should be big-endian, so I am sure a confusion occurred on his side.)

Can someone confirm whether my understanding is correct? I will gladly answer any clarifying questions.

His other content is of high quality, so I would really like to continue with the tutorial, but I need to have this figured out.

  • 1
    5 is an odd number - the rightmost bit of it must therefore be 1. i don't see what this has to do with endianness. perhaps this tutorial is not as good as you thought? – Neil Butterworth Dec 28 '22 at 00:21
  • 1
    Congratulations! You've just discovered why you will never learn C++ by watching random Youtube videos. Any clown can upload a video to Youtube. Even I can do that (and I did). There are several other ways you will never learn C++ too, such by reading blogs, or attempting to solve random coding puzzles. And, I think we'll reserve judgement on whether any of those other video are "high quality", or not. The only way to effectively learn C++ is from a good textbook. It takes money to turn dead trees into books. A publisher is not going to take risks without vetting the source for quality. – Sam Varshavchik Dec 28 '22 at 00:22
  • 1
    endianness is about _byte_ ordering. The bogus `IsLittleEndian()` is only looking at the low _bit_ of a single byte. No way to get information about _byte_ ordering from that. – Avi Berger Dec 28 '22 at 00:24
  • Why is `buffer` a pointer? That implies that you may want to send in a `nullptr` to the function, but you never check if it's a `nullptr` or not. – Ted Lyngmo Dec 28 '22 at 00:31
  • Sidenote, C++20 provides a way to find out the endianness of the system through [`std::endian`](https://en.cppreference.com/w/cpp/types/endian) – Ranoiaetep Dec 28 '22 at 03:02

2 Answers2

4

Your understanding is correct: endianness has nothing to with arithmetic operations and bitset::to_string(). Both work in exactly the same way regardless of the underlying implementation details, and there is no "bit endianness" at all, see here. You may run your code on a ternary Soviet computer for all that, and the behavior of the program will be the same.

What endianness does affect in C++ is implementation-defined stuff like memory layout and byte representation. One can exploit that to use std::memcpy and reinterpret_cast to efficiently (de)serialize integers. Be wary of the strict aliasing rule, though, if you want somewhat more portability in theory.

One could say that the encode function translates an integer into an array of bytes in a little-endian way: lower bytes have lower addresses, and that this happens regardless of the endianness of the underlying system. That may be useful when you want the same serialization protocol on different systems.

However, it's still not what the encode function does: it stores the highest byte in the lower address, etc. So it's actually encoding the value in a big-endian way.

yeputons
  • 8,478
  • 34
  • 67
1

The first part somewhat makes sense. The second part not so much.

If you were to simply std::copy the bytes of your value into buffer, i.e.

template <typename T>
void encode(std::vector<uint8_t>& buffer, size_t& position, T value)
{
    uint8_t* bytes = reinterpret_cast<uint8_t*>(&value);
    std::copy(bytes, bytes + sizeof(value), &buffer[position]);
    position += sizeof(T);
}

You would end up with buffer containing the bytes of value in whatever order the platform uses. That is, encode(buffer, n, 0x01020304) would result in buffer containing {0x04, 0x03, 0x02, 0x01} on little-endian systems and {0x01, 0x02, 0x03, 0x04} on big-endian systems.

Using bit shifts to extract each byte of the value side-steps this issue since bit shift operations don't reflect platform endianness. n & 0xFF will always give you the least-significant byte of n, (n >> 8) & 0xFF the next most significant, and so on.


The second part makes no sense at all though. That IsLittleEndian function is nonsense. Endianness is byte order, not bit order. An int8_t, being only one byte, will never tell you anything about the platform's endianness.

Miles Budnek
  • 28,216
  • 2
  • 35
  • 52