C++: Casting unsigned char to a Structure

Question

What I am trying to do

typedef struct {
    unsigned char a;
    unsigned char b;
    unsigned int  c;
} Packet;

unsigned char buffer[] = {1, 1, 0, 0, 0, 1};
Packet pkt = (Packet)buffer;

Basically I am trying to cast a byte array to a structure in C++, when compiling I get:

No matching function call for Packet::Packet(unsigned char[6])

Is this not possible or do I have to manually index into the array?

In general, whenever you feel the need to do a C-style cast in your C++ program, you should take that as a sign that you're doing something wrong. — Some programmer dude, Jan 20 '23 at 05:28
As for your problem, if the size of the structure is exactly equal to the size of the array, use the array as a *pointer* to a "Packet" structure object and copy it into the `Packet` object. — Some programmer dude, Jan 20 '23 at 05:29
Unrelated but important: add `static_assert(sizeof(Packet) == 6);`. Also note that in C++ you don't need `typedef struct`. — Evg, Jan 20 '23 at 05:31
I would just manually index the array, it is the safest way to deserialize binary data since it makes on assumption at all about how the struct is layed out in memory. If you actually have a perfomance (or memory space) problems, then start optimizing. And this technique will result in UB if you try to cast memory to a C++ object (it will not be in a valid state since the objects constructor will not have been called) — Pepijn Kramer, Jan 20 '23 at 05:41

score 1 · Accepted Answer · answered Jan 20 '23 at 05:55

There are a few ways to do this:

// packet.h
////////////////
struct Packet {
    unsigned char a;
    unsigned char b;
    unsigned int  c;
};

If you compile and dump the structs with pahole you will see the paddings

$ pahole -dr --structs main.o
struct Packet {
        unsigned char              a;                    /*     0     1 */
        unsigned char              b;                    /*     1     1 */

        /* XXX 2 bytes hole, try to pack */

        unsigned int               c;                    /*     4     4 */

        /* size: 8, cachelines: 1, members: 3 */
        /* sum members: 6, holes: 1, sum holes: 2 */
        /* last cacheline: 8 bytes */
};

So it's basically the 2 chars, 2 padding bytes and 4 bytes of an int for a total of 8 bytes.

Because Intel is a little endian platform, the least significant byte comes first as in

void print_packet( Packet* pkt ) {
    printf( "a:%d b:%d c:%d\n", int(a), int(b), c );
}
int main() {
    unsigned char buffer[] = {1, 1, 0, 0, 1, 0, 0, 0};
    print_packet( (Packet*) buffer );
    print_packet( reinterpret_cast<Packet*>(buffer));
}

Produces:

$ g++ main.cpp -o main
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1

However one can change the packing from the command line as below where we set the alignment to 2 bytes.

$ g++ -ggdb  main.cpp -o main -fpack-struct=2
$ pahole -dr --structs main
struct Packet {
        unsigned char              a;                    /*     0     1 */
        unsigned char              b;                    /*     1     1 */
        unsigned int               c;                    /*     2     4 */

        /* size: 6, cachelines: 1, members: 3 */
        /* last cacheline: 6 bytes */
} __attribute__((__packed__));

Then you can see that the Packet struct is only 6 bytes and the result of running main is completely different

$ ./main
a:1 b:1 c:65536
a:1 b:1 c:65536

This is because the value of c is now 0x00000100 or 65536

So not to be at mercy of these compiler shenanigans, it is better to define your packet in code with the right packing as

// packet.h
////////////////
struct [[gnu::packed]] Packet {
    unsigned char a;
    unsigned char b;
    unsigned char reserved[2];
    unsigned int  c;
};

Then execution becomes

$ g++ -ggdb  main.cpp x.cpp -o main -fpack-struct=2
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1
$ g++ -ggdb  main.cpp x.cpp -o main -fpack-struct=4
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1
$ g++ -ggdb  main.cpp x.cpp -o main -fpack-struct=8
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1
$ g++ -ggdb  main.cpp x.cpp -o main -fpack-struct=16
$ ./main
a:1 b:1 c:1
a:1 b:1 c:1

there are dedicated typedefs with guaranteed sizes. Instead of `unsigned char` one should use `uint8_t` or `std::byte` since C++17. — Sergey Kolesnik, Jan 20 '23 at 06:19
@SergeyKolesnik Even if I substitute `unsigned char` for `std::byte` the padding hole of 2 bytes would still exist. — Something Something, Jan 20 '23 at 06:29
And then you switch compiler or compiler settings, or to another computer (e.g. from 32 to 64 bits machine) and everything breaks down again. For maintainable/portable code this will not work. — Pepijn Kramer, Jan 20 '23 at 06:47
@PepijnKramer I didn't address type punning, hence haven't said anything about the code being safe. Non-UB code would involve `std::copy` a byte array into an instantiated POD structure (prior to C++20 anyways), not accessing the data via `reinterprec_cast`/type punning — Sergey Kolesnik, Jan 20 '23 at 06:54

score 0 · Answer 2 · edited Jan 20 '23 at 06:49

You can do this with a reinterpret_cast from the array:

Packet pkt = *reinterpret_cast<Packet*>(buffer);

What this does is decay the array into a pointer to its 1st element, then treat that pointer as a Packet* pointer, then we dereference that and copy it into a new Packet structure. This circumvents essentially all compiler type and safety checks, so you need to be very careful here.

One thing we can do to make this a bit safer is to use a static_assert to ensure that the structure is the size that we expect. This will then fail to compile if the compiler inserts any padding into the structure definition.

static_assert(sizeof(Packet) == 6);

Depending on your compiler and compilation settings, it is almost certain that your structure as written is NOT 6 bytes.

Any time you are using reinterpret_cast, you are working very close to the realm of undefined / compiler dependent behavior. Generally speaking, as long as you do the padding checks and dealing with primitive data types inside the structure, things will work as you would expect even if the code is technically undefined according to the C++ standard. Compiler writers realize this type of code is often needed and so generally support this in a sane way even if not required to by the C++ standard.

score 0 · Answer 3 · answered Jan 20 '23 at 06:08

First of all your assumption that byte representation of your structure is excatly same as you write in struct is wrong for most of current architectures. For example, on 32-bit architecture you definition will be equivalent to

struct Packet {
  char a;
  char b;
  char __hidden_padding[2];
  int c;
};

Similar thing, but with different number of padding will happen on 64-bit architecture. So, to avoid this you need to tell compiler to "pack" structure without padding bytes. There is no standard syntaxis for this, but most compilers provide means to do this. For example, for gcc/clang you can do:

struct [[gnu::packed]] Packet {
  char a;
  char b;
  int c;
};

Warning, when working with such structures it is not advised to take address of its members, see Is gcc's __attribute__((packed)) / #pragma pack unsafe?.

Now, since "simple" types like char, int, etc have implementation defined size it is much better to use fixed-sized types, and finally check that structure size is what you expect, like Evg suggsested:

struct [[gnu::packed]] Packet {
  int8_t a;
  int8_t b;
  int32_t c;
};
static_assert(sizeof(Packet) == 6);

Copying is best done by either std::bit_cast if you have C++20 or just memcpy. These 2 are only standard ways today, as far as I know. Using *reinterpret_cast<Packet*>(buffer) is undefined, though still works for most compilers.

C++: Casting unsigned char to a Structure

3 Answers3