0

My coworker wants to send some data represented by a type T over a network. He does this The traditional way™ by casting the T to char* and sending it using a write(2) call with a socket:

auto send_some_t(int sock, T const* p) -> void
{
    auto buffer = reinterpret_cast<char const*>(p);
    write(sock, buffer, sizeof(T));
}

So far, so good. This simplified example, apart from being stripped of any error-checking, should be correct. Assuming the type T is trivially copyable we can copy values of this type between objects using std::mempcy() (according to 6.7 [basic.types] point 3 in C++17 standard[1]) so I guess write(2) should also work as it blindly copies binary data.

Where it gets tricky is on the receiving side.

Assume the type T in question looks like this:

struct T {
    uint64_t foo;
    uint8_t bar;
    uint16_t baz;
};

It has a field with an alignment requirement of 8 bytes (foo) so the whole type requires a strict alignment of 8 bytes (see example for 6.6.5 [basic.align] point 2). This means that storage for values of type T must be allocated only on suitable addresses.

Now, what about the following code?

auto receive_some_t(int sock, T* p) -> void
{
    read(sock, p, sizeof(T));
}

// ...

T value;
receive_some_t(sock, &T);

Looks shady, but should work OK. The bytes received do represent a valid value of type T and are blindly copied into a valid object of type T.

However, what about using raw char buffers like in the following code:

char buffer[sizeof(T)];
read_some_t(sock, buffer);

T* value = reinterpret_cast<T*>(buffer);

This is where my coder-brain triggers a red alert. We have absolutely no guarantee that the alignment of char[sizeof(T)] matches that of T which is a problem. We also do not round-trip a pointer to a valid T object because there wasn't a valid object of type T in our memory. And we don't know what compiler and options were used on the other side (maybe the struct on the other side is packed while ours is not).

In short, I see some potential problems with just casting raw char buffers into other types and would try to avoid writing code such as above. But apparently it works and is how "everybody does it".

My question is: is recovering structs sent over a network and received into a char buffer of appropriate size legal according to C++17 standard?

If not, what about using std::aligned_storage<sizeof(T), alignof(T)> to receive such structs? And if std::aligned_storage is not legal either, is there any legal way of sending raw structs over a network, or is it a bad idea that just happens to work... until it doesn't?

I view structs as a way of representing data types and treat the way the compiler lays them out in memory is an implementation detail and not as a wire format for data exchange to be relied upon, but I am open to being wrong.

[1] www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf

Maelkum
  • 269
  • 4
  • 14
  • 1
    *But apparently it works and is how "everybody does it".* Doesn't always work, and sometimes everybody is wrong. – user4581301 Jun 01 '21 at 00:04
  • 1
    Nothing shown above is guaranteed to work. When it comes to network sockets there is no guarantee, wharsoever, that `read()` will read the number of bytes from the socket that's specified by its third parameter. It could be anywhere between 1 and that value, assuming that the socket remains open. Ignoring the return value from `read()` will always end in tears. – Sam Varshavchik Jun 01 '21 at 00:09
  • 1
    There is no guarantee that the *byte pattern* of type `T` compiled on one machine is the same as the *byte pattern* of type `T` compiled on another machine. Even if you use the same architecture, different versions of the compiler may, in theory, do different things. You may get away with it but it is very non-portable at best. – Galik Jun 01 '21 at 00:32
  • In all instance, you should not *reinterpret_cast* to your type, but copy the bytes from the *char buffer* into an object of type `T`. But I would recommend something much more portable than sending raw binary over a network. – Galik Jun 01 '21 at 00:36
  • Don't do this. Don't use structs as network protocols. Use network protocols as network protocols. Define it in terms of octets and write yourself a library to send and receive it. At preent you are totally dependent on having the same (a) compiler (b) compiler options (c) endian-ness (d) alignment (e) padding at both ends. The only way to really ensure it must work is to build both ends from the same .o or .obj file for the sending and receiving and `struct` interpretation. – user207421 Jun 01 '21 at 00:36
  • *But apparently it works...* It'll work until it doesn't. At which point you'll be livin' the nightmare. I wrote a program on DEC Alpha Tru64 architecture, and sending structs over the wire "apparently worked". Then we added i386 Linux boxes to our product, and then everything went all pear-shaped. – Eljay Jun 01 '21 at 01:10

1 Answers1

1

The dicey part is not so much the memory alignment, but rather the lifetime of the T object. When you reinterpret_cast<> memory as a T pointer, that does not create an instance of the object, and using it as if it was would lead to Undefined Behavior.

In C++, all objects have to come into being and stop existing, thus defining their lifetime. That even applies to basic data types like int and float. The only exception to this is char.

In other words, what's legal is to copy the bytes from the buffer into an already existing object, like so:

char buffer[sizeof(T)];

// fill the buffer...

T value;
std::memcpy(&value, buffer, sizeof(T));

Don't worry about performance. The compiler will optimize all that away.