4

As far as I understand, the following piece of code exhibits undefined behaviour in C11:

#include <string.h>

struct aaaa { char bbbb; int cccc; };

int main(void) {
    unsigned char buffer[sizeof(struct aaaa)] = { 0 };
    struct aaaa *pointer = &buffer[0];

    return (*pointer).cccc;
}

According to N1570 section 6.5.3.2 clause 4,

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

which is accompanied by a footnote that clarifies that

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

It's unlikely that struct aaaa * and unsigned char * have the same alignment, so we assigned an invalid value to pointer, and using *pointer therefore causes UB.

However, can I copy the structure?

#include <string.h>

struct aaaa { char bbbb; int cccc; };

int main(void) {
    unsigned char buffer[sizeof(struct aaaa)] = { 0 };
    struct aaaa target;

    memcpy(&target, buffer, sizeof(struct aaaa));

    return target.cccc;
}

Here, we pass a struct aaaa * and unsigned char * to memcpy. While that seems just as bad as the first piece of code, I can't find any wording in C11 that rules that this code exhibits UB. Does this usage of memcpy cause undefined behaviour?

Lundin
  • 195,001
  • 40
  • 254
  • 396

2 Answers2

5

No, memcpy doesn't make any assumptions about alignement. It is functionally equivalent to copying byte by byte.

BTW, accessing an auto object through an lvalue of a different type that is not a character type leads to undefined behavior, regardless of alignment. This is a violation of the effective type rule, C11 6.5 p6 and p7.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • But what does it mean to "copy byte by byte" in C11? Maybe the first byte will be fine, but how will you copy `buffer[1]` to `*(((unsigned char *)&target)+1)`? Surely, the latter expression has UB... –  Sep 07 '16 at 23:02
  • 1
    @Rhymoid, it is the same as casting the addresses to `char*` and `char const*`, respectively, and then doing a `for` loop for copying. Accessing individual bytes through character pointers is always allowed. – Jens Gustedt Sep 07 '16 at 23:06
  • While `memcpy` itself does not generally make assumptions about alignment, it may exploit assumptions a compiler makes *elsewhere* about alignment. For example,if a compiler sees that the arguments to `memcpy` are of type `uint32_t`, it may generate `memcpy` code that will fail if the pointers aren't aligned for that type. Whether or not the compiler would normally care about the alignment of a pointer which is converted to `uint32_t*` and back to `void*` without being dereferenced as a `uint32_t`, conversion of a pointer to `uint32_t*` will invoke Undefined Behavior if it is not aligned... – supercat Sep 15 '16 at 20:36
  • ...for that type (even if it is never dereferenced as that type). Since the only way `memcpy` would receive unaligned pointers would be if UB had already occurred, there would be no need for the compiler to handle that case. – supercat Sep 15 '16 at 20:38
-1

From what I understand, both cases are UB (but not because of the call to memcpy), because the compiler does not enforce alignment of start offsets of variables properly. You can enforce alignment with compiler-specific attributes to be sure, but this is of course a platform-specific solution.

Assuming the start offsets are aligned (this is an assumption from practice), like compilers usually do it to gain perfomance:

In your first example you assign at first buffer index 0. buffer is usually aligned correctly. cccc will be aligned, too, because the struct is not packed. It should not cause a problem in this case.

In second example when using memcpy everything will copy properly, because (internally) it tries its best to do aligned copy for performance and when it is not possible, it copies byte-wise. And here again, all structures and buffers are aligned with the restrictions I mentioned above.

What is the actual problem here?

You would risk it (visibly in practice) if you assign &buffer[1] (given, it is usually not aligned). An access to cccc will load a word from unaligned address. On some architectures it causes the dreaded SIGBUS. x86 detects unaligned addressing and slows down a bit (perhaps), but does not crash.

Martin Sugioarto
  • 340
  • 3
  • 15