11

Apologies in advance for what may be a silly first post on well-trodden ground. While there is plenty of material on the subject, very little of it is definitive and/or intelligible to me.

I have an AlignedArray template class to dynamically allocate memory on the heap with arbitrary alignment (I need 32-byte alignment for AVX assembly routines). This requires some ugly pointer manipulation.

Agner Fog provides a sample class in cppexamples.zip that abuses a union to do so (http://www.agner.org/optimize/optimization_manuals.zip). However, I know that writing to one member of a union and then reading from another results in UB.

AFAICT it is safe to alias any pointer type to a char *, but only in one direction. This is where my understanding gets fuzzy. Here's an abridged version of my AlignedArray class (essentially a rewrite of Agner's, to help my understanding):

template <typename T, size_t alignment = 32>
class AlignedArray
{
    size_t m_size;
    char * m_unaligned;
    T * m_aligned;

public:
    AlignedArray (size_t const size)
        : m_size(0)
        , m_unaligned(0)
        , m_aligned(0)
    {
        this->size(size);
    }

    ~AlignedArray ()
    {
        this->size(0);
    }

    T const & operator [] (size_t const i) const { return m_aligned[i]; }

    T & operator [] (size_t const i) { return m_aligned[i]; }

    size_t const size () { return m_size; }

    void size (size_t const size)
    {
        if (size > 0)
        {
            if (size != m_size)
            {
                char * unaligned = 0;
                unaligned = new char [size * sizeof(T) + alignment - 1];
                if (unaligned)
                {
                    // Agner:
                    /*
                    union {
                        char * c;
                        T * t;
                        size_t s;
                    } aligned;
                    aligned.c = unaligned + alignment - 1;
                    aligned.s &= ~(alignment - 1);
                    */

                    // Me:
                    T * aligned = reinterpret_cast<T *>((reinterpret_cast<size_t>(unaligned) + alignment - 1) & ~(alignment - 1));

                    if (m_unaligned)
                    {
                        // Agner:
                        //memcpy(aligned.c, m_aligned, std::min(size, m_size));

                        // Me:
                        memcpy(aligned, m_aligned, std::min(size, m_size));

                        delete [] m_unaligned;
                    }
                    m_size = size;
                    m_unaligned = unaligned;

                    // Agner:
                    //m_aligned = aligned.t;

                    // Me:
                    m_aligned = aligned;
                }
                return;
            }
            return;
        }
        if (m_unaligned)
        {
            delete [] m_unaligned;
            m_size = 0;
            m_unaligned = 0;
            m_aligned = 0;
        }
    }
};

So which method is safe(r)?

timrau
  • 22,578
  • 4
  • 51
  • 64
linguamachina
  • 5,785
  • 1
  • 22
  • 22
  • 3
    Instead of constructing `char` objects and then casting that to T, why don't you grab raw memory (from `operator new`, or even `malloc`), as `void*`, and actually construct `T` objects in it? Basically: if you want T objects, construct T objects. This use case (aligned array) has *zero* need for aliasing tricks/unions/memcpy/whatever. – R. Martinho Fernandes Mar 07 '13 at 15:25
  • @R.MartinhoFernandes: Except, math isn't allowed on `void *`s. How do you get an aligned `void *`? – Omnifarious Mar 07 '13 at 15:29
  • @Omnifarious Last I checked, math isn't allowed on `char*` either. (And even if it were, that would not mean you need to construct char objects and not construct T objects) You need integers to do math. The portable solution in C++11 is http://en.cppreference.com/w/cpp/memory/align. The theoretically-not-portable solution is to reinterpret_cast to a numeric type, do the math, and reinterpret_cast back. (it's quite portable in practice because in all implementations I know reinterpret_cast to numeric types behaves as expected) – R. Martinho Fernandes Mar 07 '13 at 15:30
  • the nice thing about `new` is that it produces a pointer to memory suitably aligned for any ordinary type. there comes possible problem when wandering outside the standard's reach, e.g. 128-bit or 256-bit thingies. but i guess there are special solutions for that. ah, yes, found in wikipedia AVX [intrinsic support in most compilers](http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Compiler_and_assembler_support). – Cheers and hth. - Alf Mar 07 '13 at 15:31
  • 1
    @R.MartinhoFernandes: You are allowed to do math on `char *` as long as your math doesn't push it out of the allocated range. It's perfectly valid to add 1 to a pointer or subtract 1 from it. But, yeah, finding the alignment is going to require casting it to an integral type. So in that sense I'm wrong. – Omnifarious Mar 07 '13 at 15:34
  • @R. Martinho Fernandes: I'm a little confused... this use case has a clearly-defined need for such tricks. What if you need a 32 byte-aligned array of floats on the heap? – linguamachina Mar 07 '13 at 16:05
  • You get an 32-byte-aligned pointer and construct floats on it. If you construct floats, you don't need to point to them with char*. Why would you construct chars if you want floats? – R. Martinho Fernandes Mar 07 '13 at 16:07
  • Forgive my ignorance. If you don't have a compiler that supports C++11, how would you get a 32-byte aligned pointer without such tricks? (My post was misleading because I used `nullptr` - force of habit!) – linguamachina Mar 07 '13 at 16:10
  • Depending on what you are doing you might consider forcing alignment to a cache line (64 bytes on x86). This will avoid [false sharing](http://en.wikipedia.org/wiki/False_sharing) if you decide to make your algorithm multi-threaded. – amdn Mar 07 '13 at 16:21
  • @user as you do above in this line `T * aligned = reinterpret_cast((reinterpret_cast(unaligned) + alignment - 1) & ~(alignment - 1));`, but from a `void*`. There is no aliasing going on: construct Ts instead of chars, and point to it with T* or void* instead of char*. – R. Martinho Fernandes Mar 07 '13 at 16:29
  • While I think making an allocator out of Brett's code and then using regular containers is the better option, here's how I'd do this without any aliasing involved: http://coliru.stacked-crooked.com/view?id=0a0429b6792da87c36b29ce9d529cb52-18aa934a8d82d638dde2147aa94cac94 – R. Martinho Fernandes Mar 07 '13 at 16:36
  • Aha. That's blindingly obvious, sorry. Then I suppose Agner's approach requires `char*` in order to perform the math without casting to `size_t`... so why the hell would he recommend a union-based approach, when it most definitely could result in undefined behaviour!? Is it dangerous to `reinterpret_cast` to `size_t`? – linguamachina Mar 07 '13 at 16:36
  • No, it's safe. The result is not specified by the standard, but virtually all implementations behave the "right way" (because there's no point in doing otherwise). – R. Martinho Fernandes Mar 07 '13 at 16:47

2 Answers2

3

I have code that implements the (replacement) new and delete operators, suitable for SIMD (i.e., SSE / AVX). It uses the following functions that you might find useful:

static inline void *G0__SIMD_malloc (size_t size)
{
    constexpr size_t align = G0_SIMD_ALIGN;
    void *ptr, *uptr;

    static_assert(G0_SIMD_ALIGN >= sizeof(void *),
                  "insufficient alignment for pointer storage");

    static_assert((G0_SIMD_ALIGN & (G0_SIMD_ALIGN - 1)) == 0,
                  "G0_SIMD_ALIGN value must be a power of (2)");

    size += align; // raw pointer storage with alignment padding.

    if ((uptr = malloc(size)) == nullptr)
        return nullptr;

    // size_t addr = reinterpret_cast<size_t>(uptr);
    uintptr_t addr = reinterpret_cast<uintptr_t>(uptr);

    ptr = reinterpret_cast<void *>
        ((addr + align) & ~(align - 1));

    *(reinterpret_cast<void **>(ptr) - 1) = uptr; // (raw ptr)

    return ptr;
}


static inline void G0__SIMD_free (void *ptr)
{
    if (ptr != nullptr)
        free(*(reinterpret_cast<void **>(ptr) - 1)); // (raw ptr)
}

This should be easy to adapt. Obviously you would replace malloc and free, since you're using the global new and delete for raw (char) storage. It assumes that size_t is sufficiently wide for address arithmetic - true in practice, but uintptr_t from <cstdint> would be more correct.

Brett Hale
  • 21,653
  • 2
  • 61
  • 90
  • On POSIX system, there is posix_memalign() – BatchyX Mar 07 '13 at 16:16
  • Thanks for this, it is most helpful (and we're both doing essentially the same pointer munging). Is there any chance that either of our examples could result in undefined behaviour? I'm not sure whether or not we're violating strict aliasing rules here... – linguamachina Mar 07 '13 at 16:18
  • If you put this into an allocator, you can use the rest of the standard library easily: `vector`; no need to reinvent anything else. – R. Martinho Fernandes Mar 07 '13 at 16:31
  • @user2144521 - `char *` is assumed to alias any type. I'd have to check on the ordering rules for the `aligned.s &= ~(alignment - 1);` statement though... – Brett Hale Mar 07 '13 at 16:35
  • Here's a simple allocator based on this code along with usage with `std::vector`: http://coliru.stacked-crooked.com/view?id=654f0e83dfaa194a5f9da804ac2c4cc0-18aa934a8d82d638dde2147aa94cac94 – R. Martinho Fernandes Mar 07 '13 at 16:48
  • @R.MartinhoFernandes - you're right that `uintptr_t` should be used with C++11. My understanding is that this type is always defined, while the 'exact' types may not be. – Brett Hale Mar 07 '13 at 16:53
2

To answer your question, both of those methods are just as safe. The only two operations that are really stinky there are the cast to size_t and new char[stuff]. You should at least be using uintptr_t from <cstdint> for the first. The second operation creates your only pointer aliasing issue as technically the char constructor is run on each char element and that constitutes accessing the data through the char pointer. You should use malloc instead.

The other supposed 'pointer aliasing' isn't an issue. And that's because other than the new operation you aren't accessing any data through the aliased pointers. You are only accessing data through the T * you get after alignment.

Of course, you have to remember to construct all of your array elements. This is true even in your version. Who knows what kind of T people will put there. And, of course, if you do that, you'll have to remember to call their destructors, and have to remember to handle exceptions when you copy them (memcpy doesn't cut it).

If you have a particular C++11 feature, you do not need to do this. C++11 has a function specifically for aligning pointers to arbitrary boundaries. The interface is a little funky, but it should do the job. The call is ::std::align defined in <memory>.Thanks to R. Martinho Fernandes for pointing it out.

Here is a version of your function with the suggested fixed:

#include <cstdint>  // For uintptr_t
#include <cstdlib>  // For malloc
#include <algorithm>

template <typename T, size_t alignment = 32>
class AlignedArray
{
    size_t m_size;
    void * m_unaligned;
    T * m_aligned;

public:
    AlignedArray (size_t const size)
        : m_size(0)
        , m_unaligned(0)
        , m_aligned(0)
    {
        this->size(size);
    }

    ~AlignedArray ()
    {
        this->size(0);
    }

    T const & operator [] (size_t const i) const { return m_aligned[i]; }

    T & operator [] (size_t const i) { return m_aligned[i]; }

    size_t size() const { return m_size; }

    void size (size_t const size)
    {
        using ::std::uintptr_t;
        using ::std::malloc;

        if (size > 0)
        {
            if (size != m_size)
            {
                void * unaligned = 0;
                unaligned = malloc(size * sizeof(T) + alignment - 1);
                if (unaligned)
                {
                    T * aligned = reinterpret_cast<T *>((reinterpret_cast<uintptr_t>(unaligned) + alignment - 1) & ~(alignment - 1));

                    if (m_unaligned)
                    {
                        ::std::size_t constructed = 0;
                        const ::std::size_t num_to_copy = ::std::min(size, m_size);

                        try {
                            for (constructed = 0; constructed < num_to_copy; ++constructed) {
                                new(aligned + constructed) T(m_aligned[constructed]);
                            }
                            for (; constructed < size; ++constructed) {
                                new(aligned + constructed) T;
                            }
                        } catch (...) {
                            for (::std::size_t i = 0; i < constructed; ++i) {
                                aligned[i].T::~T();
                            }
                            ::std::free(unaligned);
                            throw;
                        }

                        for (size_t i = 0; i < m_size; ++i) {
                            m_aligned[i].T::~T();
                        }
                        free(m_unaligned);
                    }
                    m_size = size;
                    m_unaligned = unaligned;
                    m_aligned = aligned;
                }
            }
        } else if (m_unaligned) { // and size <= 0
            for (::std::size_t i = 0; i < m_size; ++i) {
                m_aligned[i].T::~T();
            }
            ::std::free(m_unaligned);
            m_size = 0;
            m_unaligned = 0;
            m_aligned = 0;
        }
    }
};
Community
  • 1
  • 1
Omnifarious
  • 54,333
  • 19
  • 131
  • 194