Converting uint8_t* buffer to uint16_t and changing endianness

Question

I'd like to process data provided by an external library.

The lib holds the data and provides access to it like this:

const uint8_t* data;
std::pair<const uint8_t*, const uint8_t*> getvalue() const {
  return std::make_pair(data + offset, data + length);
}

I know that the current data contains two uint16_t numbers, but I need to change their endianness. So altogether the data is 4 bytes long and contains this numbers:

66 4 0 0

So I'd like to get two uint16_t numbers with 1090 and 0 value respectively.

I can do basic arithmetic and in one place change the endianness:

pair<const uint8_t*, const uint8_t*> dataPtrs = library.value();
vector<uint8_t> data(dataPtrs.first, dataPtrs.second);

uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]

However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).

How can I better create uint16_t from uint8_t*? I'd avoid memcpy if possible, and use something more modern/safe.

Boost has some nice header-only endian library which can work, but it needs an uint16_t input.

For going further, Boost also provides data types for changing endianness, so I could create a struct:

struct datatype {
    big_int16_buf_t     data1;
    big_int16_buf_t     data2;
}

Is it possible to safely (paddings, platform-dependency, etc) cast a valid, 4 bytes long uint8_t* to datatype? Maybe with something like this union?

typedef union {
    uint8_t u8[4];
    datatype correct_data;
} mydata;

Reinterpret-casting via a union is not allowed in C++ (other than between corresponding members of standard-layout structs). That's a Cism only. You'd need to `memcpy()` or from C++20 `std::bit_cast()`. — underscore_d, Jun 11 '20 at 13:41
@underscore_d You can't `std::bit_cast` from a pointer (unless you're bit casting the pointer value rather than pointed objects). — eerorika, Jun 11 '20 at 13:43
@eerorika Well, yeah, but you could deref the pointer to get bytes and then bit_cast those bytes to the desired type. I was proceeding from the final bit of code, which is using values. — underscore_d, Jun 11 '20 at 13:53
@underscore_d If you indirect through a `std::uint8_t*`, then you get a `std::uint8_t` which is not of correct size to be bit_casted into `std::uint16_t` (nor into `datatype`). I don't see how bit_cast could be used here. — eerorika, Jun 11 '20 at 14:05

eerorika · Accepted Answer · 2020-06-11T14:34:31.800

2

Maybe with something like this union?

No. Type punning with unions is not well defined in C++.

This would work assuming big_int16_buf_t and therefore datatype is trivially copiable:

datatype d{};
std::memcpy(&d, data, sizeof d);

uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant

This is actually (subjectively, in my opinion) quite an elegant way because it works the same way on all systems. This reads the data as little endian, whether the CPU is little, big or some other endian. This is well portable.

However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).

The vector seems entirely pointless. You could just as well use:

const std::uint8_t* data = dataPtrs.first;

edited Jun 11 '20 at 14:34

answered Jun 11 '20 at 13:52

eerorika

232,697
12
197
326

Is the `memcpy` approach worse in terms of _portability_ compared to the manual bitwise method? – Daniel Jun 11 '20 at 14:51
@Daniel Memcopy into `uint16_t` would not be good because result would depend on native endianness. I don't know how `big_int16_buf_t` works excatly, so I'm not sure if memcpying into it does what one would want. It might be just fine. – eerorika Jun 11 '20 at 14:52

score 0 · Answer 2 · answered Jun 11 '20 at 14:11

How can I better create uint16_t from uint8_t*?

If you are certain that the data sitting behind the uint8_t pointer is truly a uint16_t, C++ allows: auto u16 = *static_cast<uint16_t const*>(data); Otherwise, this is UB.

Given a big endian value, transforming this into little endian can be done with the ntohs function (under linux, other OSes have similar functions).

But beware, if the pointer you hold points to two individual uint8_t values, you mustn't convert them by pointer-cast. In that case, you have to manually specify which value goes where (conceivably with a function template). This will be the most portable solution, and in all likelihood the compiler will create efficient code out of the shifts and ors.

Converting uint8_t* buffer to uint16_t and changing endianness

2 Answers2