C++ type aliasing, where value is replaced

Question

Is the following code legal in C++?

int get_i(int idx) { ... }
float transform(int i) { ... }
void use(float f) { ... }

static_assert(sizeof(int) == sizeof(float));
void* buffer = std::malloc(n * sizeof(int));

int* i_buffer = reinterpret_cast<int*>(buffer);
float* f_buffer = reinterpret_cast<float*>(buffer);

// Fill int values into the buffer
for(int idx = 0; idx < n; ++idx)
    i_buffer[idx] = get_i(idx);

// Transform int value to float value, and overwrite
// (maybe violates strict aliassing rule?)
for(int idx = 0; idx < n; ++idx)
    f_buffer[idx] = transform(i_buffer[idx]);

for(int idx = 0; idx < n; ++idx)
    use(f_buffer[idx]);

The second step reads the buffer value as an int, and then writes a float in its place. It never accesses the memory through i_buffer again afterwards, so there is no type aliasing when reading.

However the assignment f_buffer[idx] = writes a float object into an int object, which is UB.

Is there a way to make the compiler consider this to mean that the lifetime of the int should end, and a float object should be constructed in its place, so that there is no type aliassing?

Is this an example? Because my solution would be to not use `i_buffer` at all by doing `f_buffer[idx] = transform(get_i(idx));` or `use(transform(get_i(idx)));` — mch, Oct 04 '18 at 09:58
Your sequence is write i, read i, write f, read f, you do not seem to be running into a case where the aliasing could make the compiler assume something that is not true (as e.g. write i, read f would do). But why do you feel you have to do this, anyway? Just to save memory? — , Oct 04 '18 at 09:59
Yes it is an example, the real code is much more complex, but does a similar operation — tmlen, Oct 04 '18 at 09:59
Note that in the expression `f_buffer[idx] = transform(i_buffer[idx]);`, the evaluation of the left side of the `=` may precede the evaluation of `i_buffer[idx]` — Caleth, Oct 04 '18 at 10:01
All the three loops are UB. `malloc` doesn't create objects. As there is no `int[]` object at `i_buffer` at the first loop, `i_buffer[idx] = ...` is UB. And similarly, the other two loops are UB as well. Use `union`, and `new`. — geza, Oct 04 '18 at 10:02
@jakub_d: Aggressive "aliasing optimizations" will break the sequence "write A; write B; conditionally write A; later, using a *seemingly*-different condition, read either A or B" even if no object is ever read with a type other than the one used to write it. — supercat, Oct 04 '18 at 18:54

Maxim Egorushkin · Accepted Answer · 2018-10-04T20:26:36.477

2

However the assignment f_buffer[idx] = writes a float object into an int object, which is UB.

Yep, the above breaks type aliasing rules.

To fix that, for your values you can use a union:

union U {
    float f;
    int i;
};

And then access the corresponding members of the union.

This way when you do:

buffer[idx].i = ...; // make i the active union member
...
buffer[idx].f = transform(buffer[idx].i); // make f the active union member

it avoids UB because lifetime of buffer[idx].i ends and that of buffer[idx].f starts.

edited Oct 04 '18 at 20:26

answered Oct 04 '18 at 09:59

Maxim Egorushkin

131,725
17
180
271

Given `union U buffer[10];`, would `buffer[idx].f = 1.0f;` attempt to access the stored value of an object of type `union U`, specifically `buffer[idex]`? Is the glvalue used for access (i.e. `buffer[idex].f` ) of type `float`? Does N3690 §3.10 p10 list `float` among the types that can be used to access the stored value of an object of type `union U`? Certainly any good compiler should allow such access, but is there anything in the Standard that would exempt such access from §3.10 p10 or is that merely a really popular (almost unanimously supported) extension? – supercat Oct 04 '18 at 19:05
@supercat My point is that when a union member is written to it becomes the active member. Reading the active member of a union is well defined. Note, that there is no union-cast here (when writing into one member and reading another). I did not understand your question. – Maxim Egorushkin Oct 04 '18 at 20:25
Certainly the authors of the Standard intended to allow union members to be accessed with at least some lvalues of union-member type, but I don't think the way N3690 §3.10 is written actually allows that. According to §3.10, any attempt to access the stored value of any object using a type which §3.10 does not list as suitable for such use automatically invokes UB, *even if the behavior of such access would otherwise be defined*. Do you see anything in §3.10 that would prevent member accesses from invoking UB, or anything else that specifically says §3.10 does not apply to such accesses? – supercat Oct 04 '18 at 20:47
I suspect much of the controversy surrounding the rules stems from the fact that, as written, they fail to define constructs that should clearly be defined, and the only way to make the language usable is to either (1) pretend the rules say something they don't actually say, in which case the legitimacy of other cases will depend upon how one tweaks the rules to make them usable, or (2) treat the ability to employ structures and unions as merely being a really really common extension which all quality implementations should support, even though it's not actually mandated by the Standard. – supercat Oct 04 '18 at 22:29
@supercat I am sorry, I fail to make sense of what you say. – Maxim Egorushkin Oct 05 '18 at 09:21
My question is whether there is anything in the Standard that would cause an access to the stored value of a struct or union object via member lvalue of non-character type, not to be a violation of N3690 §3.10 p10. Compilers are allowed to--and should--treat such access sensibly whether the Standard requires them to or not, but if the behavior isn't actually defined an optimizer could omit code which reads a 64-bit `long` from a union and writes the same bit pattern as a 64-bit `long long`, even if preceding code write a `long` and following code reads `long long`. – supercat Oct 05 '18 at 15:10

C++ type aliasing, where value is replaced

1 Answers1