is C++ abstraction Endian neutral?

Question

Suppose I have a client and a server that communicate 16 bits numbers with each other via some network protocols, say for example ModbusTCP, but the protocol is not relevant here.

Now I know, that the endian of the client is little (my PC) and the endian of the server is big (some PLC), the client is written entirely in C++ with Boost Asio sockets. With this setup, I thought I had to swap the bytes received from the server to correctly store the number in a uint16_t variable, however this is wrong because I'm reading incorrect values.

My understanding so far is that my C++ abstraction is storing the values into variables correctly without the need for me to actually care about swapping or endianness. Consider this snippet:

// received 0x0201  (513 in big endian)
uint8_t high { 0x02 };  // first byte
uint8_t low { 0x01 };   // second byte
// merge into 16 bit value (no swap)
uint16_t val = (static_cast<uint16_t>(high)<< 8) | (static_cast<uint16_t>(low));
std::cout<<val;   //correctly prints 513

This somewhat surprised me, also because if I look into the memory representation with pointers, I found that they are actually stored in little endian on the client:

// take the address of val, convert it to uint8_t pointer
auto addr = static_cast<uint8_t*>(&val);
// take the first and second bytes and print them 
printf ("%d ", (int)addr[0]);   // print 1
printf ("%d", (int)addr[1]);    // print 2

So the question is:

As long as I don't mess with memory addresses and pointers, C++ can guarantee me that the values I'm reading from the network are correct no matter the endian of the server, correct? Or I'm missing something here?

EDIT: Thanks for the answers, I want to add that I'm currently using boost::asio::write(socket, boost::asio::buffer(data)) to send data from the client to the server and data is a std::vector<uint8_t>. So my understanding is that as long as I fill data in network order I should not care about endianness of my system (or even of the server for 16 bit data), because I'm operating on the "values" and not reading bytes directly from memory, right?

To use htons family of functions I have to change my underlying TCP layer to use memcpy or similar and a uint8_t* data buffer, that is more C-esque rather than C++ish, why should I do it? is there an advantage I'm not seeing?

"*C++ can guarantee me that the values I'm reading from the network are correct no matter the endian of the server, correct*" : no, to have the same source definition in both server and client(s) and be sure to read integer in the right order one way is to always use functions like ntohl and reverse when you write and read to ask them to do the work, because them know if you are in little/big endian — bruno, Jun 22 '20 at 16:13
If you were to "serialize" an object on a little endian machine by taking its bytes in order, and then deserialize that network transmission on a big endian machine, the order would absolutely matter (that's assuming the alignment and padding didn't get in the way first). That's why we don't serialize like this, and that's why you need to define your network byte order when writing a communication protocol. Your communication protocol (boost and/or modbus) is likely taking care of this for you. — JohnFilleau, Jun 22 '20 at 16:13

score 2 · Accepted Answer · answered Jun 22 '20 at 16:43

(static_cast<uint16_t>(high)<< 8) | (static_cast<uint16_t>(low)) has the same behaviour regardless of the endianness, the "left" end of a number will always be the most significant bit, endianness only changes whether that bit is in the first or the last byte.

For example:

uint16_t input = 0x0201;
uint8_t leftByte = input >> 8; // same result regardless of endianness
uint8_t rightByte = input & 0xFF; // same result regardless of endianness
uint8_t data[2];
memcpy(data, &input, sizeof(input)); // data will be {0x02, 0x01} or {0x01, 0x02} depending on endianness

The same applies in the other direction:

uint8_t data[] = {0x02, 0x01};
uint16_t output1;
memcpy(&output1, data, sizeof(output1)); // will be 0x0102 or 0x0201 depending on endianness
uint16_t output2 = data[1] << 8 | data[0]; // will be 0x0201 regardless of endianness

To ensure your code works on all platforms its best to use the htons and ntohs family of functions:

uint16_t input = 0x0201; // input is in host order
uint16_t networkInput = htons(input);
uint8_t data[2];
memcpy(data, &networkInput , sizeof(networkInput));
// data is big endian or "network" order
uint16_t networkOutput;
memcpy(&networkOutput, &data, sizeof(networkOutput));
uint16_t output = ntohs(networkOutput);  // output is in host order

I see, but I don't use memcpy at all, I use `boost::asio::write(socket, boost::asio::buffer(data));`, where data is `std::vector`. So as long as I fill data in network order I should not use `htons`, right? — Federico Spinelli, Jun 23 '20 at 07:39

score 1 · Answer 2 · answered Jun 22 '20 at 16:53

The first fragment of your code works correctly because you don't directly work with byte addresses. Such code is compiled to have correct operation result independently of your platform ENDIANness due to defintion of operators '<<' and '|' by C++ language.

The second fragment of your code proves this, showing actual values of separate bytes on your little-endian system.

The TCP/IP network standardizes usage of big-endian format and provides the following utilities:

before sending multi-byte numeric values use standard functions: htonl ("host-to-network-long") and htons("host-to-netowrk-short") to convert your values to network representation,
after receiving multi-byte numeric values use standard functions: ntohl ("network-to-host-long") and ntohs ("network-to-host-short") to convert your values to your platform-specific representation.

(Actually these 4 utilities make conversions on little-endian platforms only and do nothing on big-endial platforms. But using them allways makes your code platform-independent).

With ASIO you have access to these utilities using: #include <boost/asio.hpp>

You can read more looking for topic 'man htonl' or 'msdn htonl' in Google.

score 1 · Answer 3 · answered Jun 22 '20 at 20:52

About Modbus :

For 16-bit words Modbus sends the most significant byte first, that means it uses Big-Endian, then if the client or the server use Little-Endian they will have to swap the bytes when sending or receiving.

Another problem is that Modbus does not define in what order 16-bit registers are sent for 32-bit types.

There are Modbus server devices that send the most significant 16-bit register first and others that do the opposite. For this the only solution is to have in the client configuration the possibility of swap the 16-bit registers.

Similar problem can also happen when character strings are transmitted, some servers instead of sending abcdef send badcfe

right, for 32 bit data I might need to perform some word swapping to achieve BADC or CDAB, thanks — Federico Spinelli, Jun 23 '20 at 08:02

is C++ abstraction Endian neutral?

3 Answers3