Cost of union access vs using fundamental types

Question

I have a large block of data where some operations would be fastest if the block were viewed as an array of 64 bit unsigned integers and others would be fastest if viewed as an array of 32 bit unsigned integers. By 'fastest', I mean fastest on average for the machines that will be running the code. My goal is to be near optimal in all the environments running the code, and I think this is possible if I use a void pointer, casting it to one of the two types for dereferencing. This brings me to my questions:

1) If I use a void pointer, will casting it to one of the two types for dereferencing be slower than directly using a pointer of the desired type?

2) Am I correct in my understanding of the standard that doing this will not violate the anti-aliasing rules, and that it will not produce any undefined or unspecified behaviour? The 32 and 64 bit types I am using exist and have no padding (this is a static assertion).

3) Am I correct in understanding the anti-aliasing rules to basically serve two purposes: type safety and compiler guarantees to enable optimization? If so, if all situations where the code I am discussing will be executed are such that no other dereferencing is happening, am I likely to loose out on any significant compiler optimizations?

I have tagged this with 'c11' because I need to prove from the c11 standard that the behaviour is well defined. Any references to the standard would be appreciated.

Finally, I would like to address a likely concern to be brought up in the responses, regarding "premature optimization". First off, this code is being ran on a diverse computing cluster, were performance is critical, and I know that even a one instruction slowdown in dereferencing would be significant. Second, testing this on all the hardware would take time I don't have to finish the project. There are a lot of different types of hardware, and I have a limited amount of time on site to actually work with the hardware. However, I am confident that an answer to this question will enable me to make the right design choice anyway.

EDIT: An answer and comments pointed out that there is an aliasing problem with this approach, which I verified directly in the c11 standard. An array of unions would require two address calculations and dereferencings in the 32 bit case, so I'd prefer a union of arrays. The questions then become:

1) Is there a performance problem in using a union member as an array as opposed to a pointer to the memory? I.e., is there a cost in union member access? Note that declaring two pointers to the the arrays violates the anti-aliasing rules, so access would need to be made directly through the union.

2) Are the contents of the array guaranteed invariant when accessed through one array then through the other?

And your run time check (if()) does not affect the performance? If you want it really blazingly optimal, do 2 builds and check at install time which of the 2 to install. — BitTickler, Apr 21 '15 at 00:22
Sorry, I wasn't very clear on that point. Some operations are better with 64 bit types, and other are better with 32 bit types, so I'd be using both union members during a given execution. — jack, Apr 21 '15 at 00:36
IIRC, type-punning through a union is explicitly allowed in C99. As for the performance part, yes there may be a performance penalty if you write to memory using one word-size and immediately read from it using a different word-size or a different alignment. When exactly this happens varies by processor. If you want to stay clear of this penalty, I'd recommend keeping at least 100 cycles between the write and the read. (100 cycles is probably overkill, but I've never actually benchmarked it to get a more accurate number.) — Mysticial, Apr 21 '15 at 00:41
From Mike Acton's Understanding Strict Aliasing: "Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.) ". So casting a void * pointer to both 32 and 64 bit integer pointers and doing dereferences can break strict aliasing. — Craig S. Anderson, Apr 21 '15 at 00:45
@Olaf - Yes, using a union is best from a strict aliasing point of view, and it makes the code easier to read versus casting. — Craig S. Anderson, Apr 21 '15 at 00:58
@jack: I think the major point for the layout is if you have to process the whole array as one size, then the other, or process each element partly in both modes. I cannot point at it, but I have a bad feeling about mixing accesses types. Not so much for aliasing, etc. but for performance reasons, as this may force more stores than actually required. It might really be better to process the whole array step-wise for each width (or use a 2nd as destination). The cache may be your friend. — too honest for this site, Apr 21 '15 at 03:43

too honest for this site · Answer 1 · 2015-04-21T13:34:04.550

1

I would refrain from using a void pointer. A union of two arrays or an array of union will do better.
Use a proper alignment on the whole type. C11 provides alignas() as keywords. GCC has attributes for alignment which are non-standard (and work in per-11 standards as well). Other compilers may have none at all. Depending on your architecture, there should be no performance impact. But this cannot be guaranteed (I do not see an issue her, however). You might even align the type to a larger type than 64 bits to fill a cache line perfectly. That might speed up prefetch and writeback.
Aliasing refers to the fact that an object is referenced by multiple pointer a the same time. This means the same memory address can be addressed using two different "sources". The problem is that the compiler may not be aware abouth this and thus may hold the value of a variable in a CPU register during some calculation without writing it back to memory instantly. If the same variable is then referenced by the other "source" (i.e. pointer), the compiler may read invalid data from the memory location. Imo is aliasing only relevant within a function if two pointers are pased inside. So, if you do not intend to pass two pointers to the same object (or part of it) there should be no problem at all. Otherwise, you should get comfortable with (compiler)barriers. Edit: C standard seems to be a bit more strict on that, as it requires just the lvalues accessing an object to fulfill certain criteria (C11 6.5/7 (n1570) - thks Matt McNabb).
Oh, and don't use int/long/etc. You really should use stdint.h types if you really need proper sized types.

edited Apr 21 '15 at 13:34

answered Apr 21 '15 at 00:37

too honest for this site

12,050
4
30
52

2

Ok, I have no problem being downvoted, but I would really like to know where I was wrong? – too honest for this site Apr 21 '15 at 00:59
I found your response helpful, I didn't downvote it. – jack Apr 21 '15 at 01:43
@jack: Thanks! I just hate it getting no feedback why I am wrong with a statement. How should I learn from that? – too honest for this site Apr 21 '15 at 01:47
Para 3 isn't quite right. Aliasing refers to accessing an object via an expression of different type. There might only be one pointer involved. – M.M Apr 21 '15 at 03:12
@MattMcNabb: Right from [WP][https://en.wikipedia.org/wiki/Aliasing_(computing)], first paragraph. Can you please give me a reference about what your position? (I'm really interested). Hmm.. I'll have a look at some other docs meanwhile. – too honest for this site Apr 21 '15 at 03:21
However: when using a union, that would be just a single type, so with a single pointer to that union, there would be no aliasing either way. – too honest for this site Apr 21 '15 at 03:26
@MattMcNabb: Do you refer to something like [this][http://thiemonagel.de/2010/01/no-strict-aliasing/]? That is still two ways to referene the same variable: directly and through the pointer). – too honest for this site Apr 21 '15 at 03:31
My understanding of the aliasing issue is that each access to an object is assigned an 'effective type'. If you are directly accessing the object as a variable, structure, or union member, the effective type for the access is the type of the variable. If you are using a pointer, the effective type is the type that was used for the last access (if that was not a character type). So the program basically treats an object in memory as having a type, which is implicitly given in the case of allocated space. You have to stay close to this type in a precise way. See the standard, first part of 6.5. – jack Apr 21 '15 at 03:50
So, in particular, using pointers to different types will typically mean one pointer or the other won't match the last used effective type, and changing effective types is usually undefined behaviour (unless you stay close in a precisely defined way). I think the point being made is that there are also other ways, not necessarily involving two pointers, to violate those rules. – jack Apr 21 '15 at 03:51
@Olaf my reference is the C standard. See C11 6.5/7. The example you link to does indeed only involve one pointer. – M.M Apr 21 '15 at 04:08
@jack your last comment is unclear. For objects with a declared type, the effective type matches that, so "last used" is irrelevant; the use (read or write) has to match the declared type. For objects with no declared type (i.e. malloc'd space) then the last *write* sets the effective type, and all reads have to match that type (with some exceptions as noted in 6.5/7) – M.M Apr 21 '15 at 04:10
Ya, I should have said 'write' instead of 'use'. However, I wasn't trying to say by the 'last use' thing that this is a summary of the rule. I was saying that if you are writing with two pointers that are not interchangeable by section 6.5, then you will get undefined behaviour. I was trying to point out that this is a special case of the rule, and sometimes confusion arises by equating this special case with the entire rule. – jack Apr 21 '15 at 04:15
@MattMcNabb: I would say this is correct for both a union of arrays or an array of unions, as the type accessed is actually that of the lvalue to access it. It is, however not allowed to have an array of u64 for instance and access the elements by a pointer to u32, which would be the case if using a void * and casting it arbitrarily. – too honest for this site Apr 21 '15 at 13:26

score 1 · Answer 2 · answered Apr 21 '15 at 07:15

There are different aspects to your question. First of all, interpreting memory with different types has several problems:

aliasing
alignment
padding

Aliasing is a "local" problem. Inside a function, you don't want to have pointers to the same object that have a different target type. If you do modify such pointed to objects, the compiler may pretend not to know that the object may have changed and optimize your program falsely. If you don't do that inside a function (e.g do a cast right at the beginning and stay with that interpretation) you should be fine for aliasing.

Alignment problems are often overlooked nowadays because many processors now are quite tolerant with alignment problems, but this is nothing portable and might also have performance impacts. So you'd have to ensure that your array is aligned in a way that is suitable for all types that you access it. This can be done with _Alignas in C11, older compilers have extensions that also allow for this. C11 adds some restrictions to aligment, e.g that this is always a power of 2, which should enable you to write portable code with respect to this problem.

Integer type padding is something rare these days (only exception is _Bool) but to be sure you should use types that are known not to have problems with that. In your case these are [u]int32_t and [u]int64_t that are known to have exactly the number of bits requested and of have two's complement representation for the signed types. If a platform doesn't support them, your program would simply not compile.

Cost of union access vs using fundamental types

2 Answers2