How should I approach parsing the network packet using C++ template?

Question

Let's say I have an application that keeps receiving the byte stream from the socket. I have the documentation that describes what the packet looks like. For example, the total header size, and total payload size, with the data type corresponding to different byte offsets. I want to parse it as a struct. The approach I can think of is that I will declare a struct and disable the padding by using some compiler macro, probably something like:

struct Payload
{
   char field1;
   uint32 field2;
   uint32 field3;
   char field5;
} __attribute__((packed));

and then I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure. Another way I can think of is that process the bytes one by one and fill the data into the struct. I think either one should work but it is kind of old school and probably not safe.

The reinterpret_cast approach mentioned, should be something like:

void receive(const char*data, std::size_t data_size)
{
    if(data_size == sizeof(payload)
    {
        const Payload* payload = reinterpret_cast<const Payload*>(data);
       // ... further processing ...
    }
}

I'm wondering are there any better approaches (more modern C++ style? more elegant?) for this kind of use case? I feel like using metaprogramming should help but I don't have an idea how to use it.

Can anyone share some thoughts? Or Point me to some related references or resources or even relevant open source code so that I can have a look and learn more about how to solve this kind of problem in a more elegant way.

If you use the class presented in the code snippet, definetly make sure to use `static_assert` with `sizeof` and `offsetof` to ensure the compiler actually creates a structure matching your header. Note that the code may not be portable though and most likely you do need to do an endianness conversion on the numbers: Usually network protocols encode numbers as big endian and many architectures use little endian for the encoding of numbers... — fabian, May 26 '22 at 17:17
Other than what @fabian said, if performance are a relevant matter, be aware that your suggested approach could reduce them more than expected. — Federico, May 26 '22 at 17:53
" I can declare a buffer and memcpy the bytes to the buffer and reinterpret_cast it to my structure" No you may not. That's Undefined Behavior. However, you can `memcpy` the bytes into the address of an instance of the struct `reinterpret_cast`ed to `char*`. — , May 26 '22 at 18:10
Congratulations. You just killed performance. Never used packed to access data more than once. You need to memcpy the individual fields into a not-packed structure or copy-assign them from the packed to a not-packed version (which just automates the memcpy). — Goswin von Brederlow, May 26 '22 at 19:50
You may want some sort of mechanism to serialize the meta info into the network packet, and then a mechanism to deserialize the packet into a meta info structure. If your network packet includes variable length entries, it can get difficult without using some kind of mechanism like that. See [this](https://stackoverflow.com/a/25938871/315052) for an example of defining an interface over a fairly complicated on the wire data packet. — jxh, May 26 '22 at 22:13
@Frank I have updated a snippet of the approach I mentioned, do you mean this is UB? Actually performance is very critical in my usecase, how to analysis the potential performance loss of my approach? — Kelvinyu1117, May 27 '22 at 04:24
@Goswin von Brederlow actually I'm wondering the cache locality thing, I'm trying to fit the size of the struct into a cache line. I think it should be helpful for reducing the chance of cache misses, isn't it? — Kelvinyu1117, May 27 '22 at 04:29
@Kelvinyu1117 Some architectures don't allow unaligned access (Alpha, Arm optionally, ...) while for others it's slow (older x86, when crossing cache lines). When unaligned access is impossible the compiler has to access each member byte by byte and reconstruct the value with shifts and ors. Or split it up to write the value byte by byte. If you do that more than once you really are just better of doing memcpy. The instruction overhead outweighs the cache savings. And if you want to save cache you should move `field5` after `field1`. — Goswin von Brederlow, May 27 '22 at 13:05
@Kelvinyu1117 Yes, this is UB, for multiple reasons: 1) it's a strict aliasing violation. 2) (probably) You risk reading unaligned data. You should do `Payload payload; std::memcpy(&payload, data, sizeof(Payload));` instead. It is both safe, and compilers will optimize this correctly, so there is no overhead to doing the correct thing. — , May 28 '22 at 14:46
@Kelvinyu1117 As a general principle: compilers are shockingly smart. You should **never** assume that something is slower than a hacky way of doing things until you've actually looked at the resulting assembly. Case in point: https://gcc.godbolt.org/z/K51zhjGfj — , May 28 '22 at 14:50
@Frank Thank you for your explanation! In terms of the performance, I guess your approach to using a template to parse the packet should be similar to `std::memcpy` right? So the reason for us to use a template to do instead of processing it byte by byte is that the template allows us to reuse some of the common operations, like handling the endianess. Can I say that its benefit is more on the code reusability than the performance? — Kelvinyu1117, May 28 '22 at 18:44

score 1 · Accepted Answer · 2022-05-26T19:57:13.680

There are many different ways of approaching this. Here's one:

Keeping in mind that reading a struct from a network stream is semantically the same thing as reading a single value, the operation should look the same in either case.

Note that from what you posted, I am inferring that you will not be dealing with types with non-trivial default constructors. If that were the case, I would approach things a bit differently.

In this approach, we:

Define a read_into(src&, dst&) function that takes in a source of raw bytes, as well as an object to populate.
Provide a general implementation for all arithmetic types is provided, switching from network byte order when appropriate.
Overload the function for our struct, calling read_into() on each field in the order expected on the wire.

#include <cstdint>
#include <bit>
#include <concepts>
#include <array>
#include <algorithm>

// Use std::byteswap when available. In the meantime, just lift the implementation from 
// https://en.cppreference.com/w/cpp/numeric/byteswap
template<std::integral T>
constexpr T byteswap(T value) noexcept
{
    static_assert(std::has_unique_object_representations_v<T>, "T may not have padding bits");
    auto value_representation = std::bit_cast<std::array<std::byte, sizeof(T)>>(value);
    std::ranges::reverse(value_representation);
    return std::bit_cast<T>(value_representation);
}

template<typename T>
concept DataSource = requires(T& x, char* dst, std::size_t size ) {
  {x.read(dst, size)};
};

// General read implementation for all arithmetic types
template<std::endian network_order = std::endian::big>
void read_into(DataSource auto& src, std::integral auto& dst) {
  src.read(reinterpret_cast<char*>(&dst), sizeof(dst));

  if constexpr (sizeof(dst) > 1 && std::endian::native != network_order) {
    dst = byteswap(dst);
  }
}

struct Payload
{
   char field1;
   std::uint32_t field2;
   std::uint32_t field3;
   char field5;
};

// Read implementation specific to Payload
void read_into(DataSource auto& src, Payload& dst) {
  read_into(src, dst.field1);
  read_into<std::endian::little>(src, dst.field2);
  read_into(src, dst.field3);
  read_into(src, dst.field5);
}

// mind you, nothing stops you from just reading directly into the struct, but beware of endianness issues:
// struct Payload
// {
//    char field1;
//    std::uint32_t field2;
//    std::uint32_t field3;
//    char field5;
// } __attribute__((packed));
// void read_into(DataSource auto& src, Payload& dst) {
//   src.read(reinterpret_cast<char*>(&dst), sizeof(Payload));
// }

// Example
struct some_data_source {
  std::size_t read(char*, std::size_t size);
};

void foo() {
    some_data_source data;

    Payload p;
    read_into(data, p);
}

An alternative API could have been dst.field2 = read<std::uint32_t>(src), which has the drawback of requiring to be explicit about the type, but is more appropriate if you have to deal with non-trivial constructors.

see it in action on godbolt: https://gcc.godbolt.org/z/77rvYE1qn

with this solution (the old good way, reading every field one by one instead of the entire structure as OP wanted to do), does the `Payload` still need to be `packed`? — Federico, May 26 '22 at 19:47
@Federico Well, nothing stops someone from just reading directly into the struct (I've amended the code in the answer to reflect that). But no, there is no hard requirement for the struct to be packed in that case. — , May 26 '22 at 19:50
@Frank, probably the struct doesn't have to be `packed`. Reasons for me to make it `packed` is two fold: 1. the bytes are continuous in the stream, 2. I'm worrying about the cache locality thing — Kelvinyu1117, May 27 '22 at 04:34
@Kelvinyu1117 Loading misaligned data into a register is much, much more expensive than the cache locality implication of padding structs like this. — , May 27 '22 at 04:49

How should I approach parsing the network packet using C++ template?

1 Answers1