What is the most efficient way to access an aligned T & from a char[]?

Question

I was working on this class last night as a type-safe wrapper for memory aligned objects. I have the byte array and the math to access the byte array's memory for reading and writing as T. I am curious, though, how I can provide the most efficient access to the aligned T.

I tried using a public T & called Value which I would initialize to the aligned T in the constructor initializer list. Like this:

template <typename T, size_t alignment = 64>
struct Aligned {
private:
    std::uint8_t bytes[sizeof(T) + alignment - 1];
public:
    T & Value;
    Aligned(T const & value = T()) : Value(*reinterpret_cast<T *>((intptr_t)bytes + (alignment - 1) & ~intptr_t(alignment - 1))) {
        Value = value;
    }
};

That increases the size of the class by sizeof(T *) since T & Value needs to store the address of the aligned T.

My other approach is to not store the address but to calculate it each time access is required, via accessor methods...

#include <array>
#include <cstdint>

template <typename T, size_t alignment = 64>
struct Aligned {
private:
    std::array<uint8_t, sizeof(T) + alignment - 1> const bytes;
public:
    T const & value() const {
        return *reinterpret_cast<T *>((intptr_t)bytes.data() + (alignment - 1) & ~intptr_t(alignment - 1));
    }
    void value(T const & x) {
        *reinterpret_cast<T *>((intptr_t)bytes.data() + (alignment - 1) & ~intptr_t(alignment - 1)) = x;
    }
    Aligned(T const & x = T()) {
        value(x);
    }
};

This approach will require pointer arithmetic and a pointer dereference (I think?) for each access but adds nothing to the size of the class.

Are there any other approaches or tricks to get both advantages?

I might not understand the problem but should it not be `std::uint8_t bytes[alignment - (sizeof(T) % alignment)];` ? — andre, Jun 07 '13 at 16:31
@andre: That only works if alignment > sizeof(T), which is perhaps not always the case. I guess we could have some terniary operator choose the correct size. — Mats Petersson, Jun 07 '13 at 16:36
I might be using the wrong terminology here. I am aiming for a wrapper class which allows easy writing of classes whose members should be aligned and spaced so to avoid false sharing. — Nick Strupat, Jun 07 '13 at 16:36
@MatsPetersson I was thinking `(sizeof(T) % alignment)` means whatever size T is we take the mod of it which is always less than alignment. — andre, Jun 07 '13 at 18:41
Ah, but that would also put objects too close together. If the objects need to be 64 bytes apart, then storing it in, say `64-(4 %64)` = `60` bytes isn't going to work well. — Mats Petersson, Jun 07 '13 at 22:47

score 2 · Answer 1 · answered Jun 07 '13 at 17:07

2

If you have access to C++11, you can use the new alignas keyword to get the compiler to align a type or variable for you.

alignas(64) classA myA;

answered Jun 07 '13 at 17:07

zindorsky

1,592
9
9

VS11 doesn't have alignas >. – Nick Strupat Jun 07 '13 at 17:21
@MooingDuck: If you have C++03, how would you use `aligned_storage` that was added in C++11? – Casey Jun 07 '13 at 19:52
@Casey: oops, I thought it was C++03. Well, boost has it: http://www.boost.org/doc/libs/1_41_0/boost/aligned_storage.hpp – Mooing Duck Jun 07 '13 at 20:42
To be fair, `aligned_storage` was originally in TR1, so lots of pre-C++11 compilers probably have a compatible implementation. – Casey Jun 07 '13 at 21:11

score 1 · Accepted Answer · answered Jun 07 '13 at 16:51

I think option 1 looks neater, and I don't think there is any benefit with option 2.

However, if you need to know which gives you best performance, you really need to run the code in a way that can measure performance. Me, or anyone else, looking at the code and saying "A looks better than B" is no good - compilers aren't 100% predictable, and sometimes the choice that "Looks good" isn't the best choice. This is something I say about ALL performance posts, and there is a good reason for that. I have personally experienced it where you look at two pieces of code, saying "Well, they are going to take the same time, they are almost identical", but because there is some subtle difference, the performance is noticeably better in case A than in case B (or the other way around).

Make sure you don't just test the trivial case here, you need a few different variations, such as a struct with a fair number of members, large and small array, as well as the simple int, long long, double, etc.

What is the most efficient way to access an aligned T & from a char[]?

2 Answers2