I have a large block of data where some operations would be fastest if the block were viewed as an array of 64 bit unsigned integers and others would be fastest if viewed as an array of 32 bit unsigned integers. By 'fastest', I mean fastest on average for the machines that will be running the code. My goal is to be near optimal in all the environments running the code, and I think this is possible if I use a void pointer, casting it to one of the two types for dereferencing. This brings me to my questions:
1) If I use a void pointer, will casting it to one of the two types for dereferencing be slower than directly using a pointer of the desired type?
2) Am I correct in my understanding of the standard that doing this will not violate the anti-aliasing rules, and that it will not produce any undefined or unspecified behaviour? The 32 and 64 bit types I am using exist and have no padding (this is a static assertion).
3) Am I correct in understanding the anti-aliasing rules to basically serve two purposes: type safety and compiler guarantees to enable optimization? If so, if all situations where the code I am discussing will be executed are such that no other dereferencing is happening, am I likely to loose out on any significant compiler optimizations?
I have tagged this with 'c11' because I need to prove from the c11 standard that the behaviour is well defined. Any references to the standard would be appreciated.
Finally, I would like to address a likely concern to be brought up in the responses, regarding "premature optimization". First off, this code is being ran on a diverse computing cluster, were performance is critical, and I know that even a one instruction slowdown in dereferencing would be significant. Second, testing this on all the hardware would take time I don't have to finish the project. There are a lot of different types of hardware, and I have a limited amount of time on site to actually work with the hardware. However, I am confident that an answer to this question will enable me to make the right design choice anyway.
EDIT: An answer and comments pointed out that there is an aliasing problem with this approach, which I verified directly in the c11 standard. An array of unions would require two address calculations and dereferencings in the 32 bit case, so I'd prefer a union of arrays. The questions then become:
1) Is there a performance problem in using a union member as an array as opposed to a pointer to the memory? I.e., is there a cost in union member access? Note that declaring two pointers to the the arrays violates the anti-aliasing rules, so access would need to be made directly through the union.
2) Are the contents of the array guaranteed invariant when accessed through one array then through the other?