0

I'm using a char array in a struct to hold some generic data, like this (the input type may be a struct of unknown size so I can't just use a union; this code is heavily simplified):

typedef struct {
    char buf[256];
} data;

void data_set_int(data *d, int a) {
    memcpy(d->buf, &a, sizeof(a));
}

int data_get_int(data *d) {
    int ret;
    memcpy(&ret, d->buf, sizeof(ret));
    return ret;
}

void data_set_float(data *d, float a) {
    memcpy(d->buf, &a, sizeof(a));
}

float data_get_float(data *d) {
    float ret;
    memcpy(&ret, d->buf, sizeof(ret));
    return ret;
}

int main(void) {
    data d;

    data_set_int(&d, 3);
    int int_result = data_get_int(&d);

    data_set_float(&d, 10.0f);
    float float_result = data_get_float(&d);

    return 0;
}

If I never attempt to write a float and then read the data as an int or vise versa, is this code well-defined in C(99)?

Compiling with GCC yields no warnings, and running the code gives the expected behavior (int_result == 3, float_result == 10.0f). Changing the memcpy to a normal pointer dereference (int ret = *(int *)d->buf) also works fine with no warnings.

All of the sources I've read on strict aliasing say that you can read any type as a char * (so I think that means set should be fine), but you cannot read a char * as any other type (not so sure that get is fine). Have I misunderstood the rule?

Andrew Sun
  • 4,101
  • 6
  • 36
  • 53
  • `memcpy` is the correct way to do this. – Cory Nelson Jan 07 '17 at 04:43
  • 1
    This is fine until you get to `*(int *)d->buf`. At that point all bets are off because `int` might have an alignment constraint that `d->buf` may or may not meet. – Gene Jan 07 '17 at 04:49
  • @Gene Ah I see, so if I tried e.g. `*(int *)(&d->buf[1])` then it would fail but `memcpy` would still work, right? Do strict aliasing rules not apply when `memcpy` is used? – Andrew Sun Jan 07 '17 at 04:57
  • 2
    @AndrewSun No. Alignment restrictions mean you'll get an error for some processors and compilers and not others. Strict aliasing prohibits deref'ing the same chunk of memory through two pointers of different _non-character_ types. You're not doing that. – Gene Jan 07 '17 at 05:04
  • 1
    @AndrewSun On the other hand, this is not a smart way to go. This is exactly what `union`s are for, and they'll produce more efficient code because they `will` guarantee alignment for all the field types, so bytewise copying isn't needed. – Gene Jan 07 '17 at 05:10
  • 1
    @chux Does the byte layout matter in this case? I am `memcpy`ing from `int -> char[] -> int` (or `float -> char[] -> float`) without ever touching the bytes in the buffer. – Andrew Sun Jan 07 '17 at 05:40
  • 1
    @chux I am always reading the same type that I wrote (I'm not trying to write an `int` and read a `float`, for instance). In my actual code, the type of data written to/read from the buffer is not known until runtime (and it can hold arbitrary structs too, so I can't just use a `union` of "all possible types"). I'm using it to implement a "generic" storage which can hold any data type. – Andrew Sun Jan 07 '17 at 05:57
  • 2
    @AndrewSun I read your question wrong "If I never attempt to write a float and then read the data as an int or vise versa, is this code well-defined in C(99)?" . I missed the "never"` part. So concerning my previous comments, [never mind](https://www.youtube.com/watch?v=V3FnpaWQJO0) – chux - Reinstate Monica Jan 07 '17 at 06:02

1 Answers1

1

Under C89, the behavior of memcpy was analogous to reading each byte of the source using an unsigned char*, and writing each byte of the destination using an unsigned char*; since character pointers may be used to access anything else, that made memcpy universal for purposes of data conversion.

C99 added a some new restrictions to memcpy which still allow it to be used in cases where the destination object has a declared type, or where the effective type of all non-character pointers that are going to be used to read the destination object are consistent with the effective type of the source, but leave objects without a declared type in a state which is only readable using the source type. I don't think C11 has eased those restriction in any meaningful way.

Your code should not be affected by the memcpy rules since each memcpy operation either writes to an object with a declared type, or writes to storage which will only be red via memcpy to an objecct with a declared type. The main problem situation with C99's memcpy rules occurs when code needs to update objects in place without knowing the type with which they will next be read.

For example, on a system where both int and long have identical 32-bit representations, it should be possible to write a function that can load data into either an int[] or a long[] without having to know which kind of pointer it is receiving (the sequence of machine operations would be the same in either case). If code reads some data into a temporary int[] and then uses memcpy to move it to the final destination, the sequence would be guaranteed by the Standard to work if the destination is an actual declared object of type int[] or long[], or if it is a region of allocated storage that will be read as int[], but would not be guaranteed to work if it is a region of allocated storage that will next be read as long.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • So if I understood this right, as long as the source and destination types are the same, it is always safe to `memcpy` some data to an intermediate buffer, then `memcpy` it to its destination (`T -> char[] -> T`)? – Andrew Sun Jan 08 '17 at 02:34
  • The problematic situation occurs if the source has an effective type, the destination does not have a declared type, and the destination is next read using a type other than the source. In your example, the `memcpy` operations that copy data from `char[]` write it to a local variable which has a declared type. The Standard as written allows implementations to assume that a memcpy from a `char[]` to an object received from `memcpy` will not affect any object that was accessed using an `int*`, `float*`, etc. While it may seem absurd that implementations would do that... – supercat Jan 08 '17 at 18:21
  • ...implementations use the aliasing rules to justify a lot of equally-absurd behavior. – supercat Jan 08 '17 at 18:22