6

I was wondering about a possible way to make memory layout of a class to be more effective in templated code. As far as I know, Standard mandates data members of a class to be laid out in memory on order of their declaration. There might be possible padding done by the compiler to align the data members adding unnecessarily to the size of the class. The idea is to re-arrange data members declarations at compile time to minimize such padding. I did some searching, but couldn't find any info (most of the time people discuss packing compiler directives, which is not quite the same as I see it).

First, please consider the following (trivial, but repetitive and ugly) code (same code on ideone.com) (questions are below the code, feel free to skip right down to them):

#include <iostream>
#include <cstdint>

namespace so
{
template <typename Ta, typename Tb, typename Tc, std::size_t =
    ((sizeof(Ta) >= sizeof(Tb)) && (sizeof(Tb) >= sizeof(Tc))) ? 10 :
    ((sizeof(Ta) >= sizeof(Tc)) && (sizeof(Tc) >= sizeof(Tb))) ? 11 :
    ((sizeof(Tb) >= sizeof(Ta)) && (sizeof(Ta) >= sizeof(Tc))) ? 20 :
    ((sizeof(Tb) >= sizeof(Tc)) && (sizeof(Tc) >= sizeof(Ta))) ? 21 :
    ((sizeof(Tc) >= sizeof(Ta)) && (sizeof(Ta) >= sizeof(Tb))) ? 30 :
    ((sizeof(Tc) >= sizeof(Tb)) && (sizeof(Tb) >= sizeof(Ta))) ? 31 : 0>
struct foo {};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 10>
{
  Ta a;
  Tb b;
  Tc c;
  foo(Ta _a, Tb _b, Tc _c) : a{_a}, b{_b}, c{_c} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 11>
{
  Ta a;
  Tc c;
  Tb b;
  foo(Ta _a, Tb _b, Tc _c) : a{_a}, c{_c}, b{_b} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 20>
{
  Tb b;
  Ta a;
  Tc c;
  foo(Ta _a, Tb _b, Tc _c) : b{_b}, a{_a}, c{_c} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 21>
{
  Tb b;
  Tc c;
  Ta a;
  foo(Ta _a, Tb _b, Tc _c) : b{_b}, c{_c}, a{_a} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 30>
{
  Tc c;
  Ta a;
  Tb b;
  foo(Ta _a, Tb _b, Tc _c) : c{_c}, a{_a}, b{_b} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 31>
{
  Tc c;
  Tb b;
  Ta a;
  foo(Ta _a, Tb _b, Tc _c) : c{_c}, b{_b}, a{_a} {}
};

template <typename Ta, typename Tb, typename Tc>
struct bar: public foo<Ta, Tb, Tc>
{
 private:
  using base = foo<Ta, Tb, Tc>;
 public:
  bar() = default;
  using base::base;
};

template <typename Ta, typename Tb, typename Tc>
struct foobar
{
  Ta a;
  Tb b;
  Tc c;
  foobar() = default;
  foobar(Ta _a, Tb _b, Tc _c) : a{_a}, b{_b}, c{_c} {}
};
} //namespace so

int main()
{
 so::bar<std::uint16_t, std::uint32_t, std::uint16_t> bar{1, 2, 3};
 so::foobar<std::uint16_t, std::uint32_t, std::uint16_t> foobar{1, 2, 3};

 std::cout << sizeof(bar) << "\t" << sizeof(foobar) << std::endl;

 std::cout << bar.a << " : " << bar.b << " : " << bar.c << std::endl;
 std::cout << foobar.a << " : " << foobar.b << " : " << foobar.c << std::endl;

 return (0);
}

Program output:

8   12
1 : 2 : 3
1 : 2 : 3

Questions:

  1. Is there some well-known, compiler-independent way of solving such thing (Boost, maybe)?
  2. If no, is there some compiler-specific directives that will do such thing automatically (without data misalignment as with GCC's __atribute__((packed)))?
  3. Can this be done in a more generic way (possibly using variadic templates)?

Thanks in advance!

lapk
  • 3,838
  • 1
  • 23
  • 28
  • Hint: I believe you could solve the repetition issue by using inheritance, which opens the door to variadic templates, however this will heavily rely on the compiler and the code may not be as elegant (aka, you will have to use a function to access the element, such as `get<0>(foo)`). I would also note that you are only looking at the sizes, and not the alignments, on some architectures `double` is 8 bytes long, but aligned on 4 bytes boundaries, so your computation might be sub-optimal. – Matthieu M. Sep 24 '13 at 07:23
  • @MatthieuM. Do you mean sort of recursive inheritance as we unroll the parameter pack, and use some kind of `get<>` to navigate through it? Alignment can be dealt with `alignof()/std::alignment_of<>`, I imagine - thanks for pointing that out. – lapk Sep 24 '13 at 08:13
  • Actually, my main issue was that unfortunately you cannot use pack expansion in the attributes (annoying) and there are various ways to deal with it... and then I realized I already reviewed those ways when checking how `std::tuple` was implemented and that instead of trying to re-implement them using inheritance tricks I could simply reuse `std::tuple` :D – Matthieu M. Sep 24 '13 at 08:46

2 Answers2

5

I believe I have a relatively simple variadic template solution.

The implementation will require a couple helpers though, so I will present it backward so you can get the gist of it first.

template <typename... Args>
class OptimizedLayout {
public:
    template <size_t I>
    auto at() -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

    template <size_t I>
    auto at() const -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

private:
    using Indexed = /**/; // pairs of sorted Args (by decreasing size)
                          // and their original index in the argument list

    using Storage = /*std::tuple<Indexed.first ...>*/;

    template <size_t I>
    using Index = /*index of element of Indexed whose .second is I*/;

    Storage _storage;
}; // class OptimizedLayout

The main benefit here is that changing how to pack elements will only affect how Indexed is defined, so you can easily improve on the algorithm. For now, I will just provide an equivalent of your template.


Disclaimer: the following code is untested, it may not even compile, let alone produce the correct result.

I. Generating indices.

The explanation can be found on the lounge, we can reuse it to generate a pack of pairs (type, index). To sort it we are going to use a MPL algorithm, so it is simpler to produce the pack as a MPL vector.

template <std::size_t... Is>
struct indices {};

template <std::size_t N, std::size_t... Is>
struct build_indices
  : build_indices<N-1, N-1, Is...> {};

template <std::size_t... Is>
struct build_indices<0, Is...> { using type = indices<Is...>; };

template <typename Tuple, typename Indices>
struct IndexedImpl;

template <typename... T, size_t... I>
struct IndexedImpl< std::tuple<T...>, indices<I...> > {
    using type = boost::mpl::vector< std::pair<T, I>... >;
};

template <typename Tuple>
using Indexed =
    IndexedImpl<Tuple, typename build_indices<std::tuple_size<Tuple>::value>::type>;

II. Sorting

In order to sort we are going to use the MPL sort algorithm, which operates on types.

struct GreaterSize {
    template <typename T, typename U>
    struct apply {
         using type = boost::mpl::bool_<sizeof(T) > sizeof(U)>;
    };
};

template <typename T>
struct TupleInserter {
    using state = T;

    template <typename Seq, typename E>
    struct apply;
};

template <typename T>
template <typename... Args, typename E>
struct TupleInserter<T>::apply<std::tuple<Args...>, E> {
    using type = std::tuple<Args..., E>;
};

template <typename Tuple>
using SortedSequence = boost::mpl::sort<
    typename Indexed<Tuple>::type,
    GreaterSize,
    TupleInserter
>;

III. Computing the Storage class

Now, we only need to compute the storage class which is done by extracting the first element of each pair. Interestingly, pattern matching can really assist here.

template <typename T>
struct TupleFirstExtractor;

template <typename... T, size_t... I>
struct TupleFirstExtractor<std::tuple<std::pair<T, I>...>> {
    using type = std::tuple<T...>;
}; 

IV. Computing the Index solver

template <typename Tuple, size_t Needle, size_t Acc>
struct IndexFinderImpl;

template <typename H, size_t h, typename... Tail, size_t Needle, size_t Acc>
struct IndexFinderImpl<std::tuple<std::pair<H,h>, Tail...>, Needle, Acc>:
    IndexFinderImpl<std::tuple<Tail...>, Needle, Acc+1> {};

template <typename H, typename... Tail, size_t Needle, size_t Acc>
struct IndexFinderImpl<std::tuple<std::pair<H, Needle>, Tail...>, Needle, Acc>:
    std::integral_constant<size_t, Acc> {};

V. Putting it all together

And now we wire up everything:

template <typename... Args>
class OptimizedLayout {
public:
    template <size_t I>
    auto at() -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

    template <size_t I>
    auto at() const -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

private:
    using Indexed = typename SortedSequence<std::tuple<Args...>>::type;

    using Storage = typename TupleFirstExtractor<Indexed>::type;

    template <size_t I>
    using Index = IndexFinderImpl<Indexed, I, 0>;

    Storage _storage;
}; // class OptimizedLayout

Hint: I advise using a specialized namespace to hold all the helpers. Whilst they could be defined within the template, it is easier to define them outside since they do not depend on Args..., however you would need to isolate them to avoid clashes with other parts of your program.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • **+1** Thanks for such a detailed break down. I am going to look into it as soon as I have a bit more time and try to implement it fully. I think, after that I should be able to update the question and accept your answer. And once again, thanks for pointing out the deficiency of `sizeof()` use in this case. You are absolutely right, it should be `alignof()` instead. – lapk Sep 24 '13 at 17:52
  • @PetrBudnik: note that after reading R. Martinho Fernandes' blog serie I realized that there is no guarantee that `std::tuple` lays the elements in the order you pass them. So I would actually recommend that you take his implementation; though I'll let this answer here because it is small enough to be easily groked. – Matthieu M. Sep 24 '13 at 18:29
  • Yes, I read the part comparing `libc++` and `libstdc++` `std::tuple`'s layout (and about GCC 4.7.2 bug). But you made a comprehensive, swift post, covering the logic of it in detail and giving a few useful references. And I actually like that it leaves room for me to try things. I just want to give it justice by actually doing it (and maybe asking a few questions in the process). – lapk Sep 24 '13 at 20:46
  • But, really, "you got me at" `alignof()`. Made me think about something like `char[10]` passed as a template parameter and that, in fact, your "_`sizeof()` might be sub-optimal_" line from the original comment was far more than generous ;). – lapk Sep 24 '13 at 20:46
  • @PetrBudnik: Actually, most of the times you can get away with ordering the elements by decreasing size, so even though it's not perfect, it is still pretty reliable. – Matthieu M. Sep 25 '13 at 06:11
2

Take a look at this series of blog posts by R. Martinho Fernandes : Link.

It outlines optimal packing of tuples. You can use such a "packed" tuple as the data storage for you class, and provide accessors hiding the get<0>() style tuple element access.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
mitchnull
  • 6,161
  • 2
  • 31
  • 23
  • Very interesting serie, however a quick outline of the challenges/solutions would be nice to flesh out this answer. – Matthieu M. Sep 24 '13 at 09:07
  • yes, well, I'm not the author of those posts, and I wouldn't like to present the solution as mine... maybe I should've posted this as a comment instead... – mitchnull Sep 24 '13 at 09:19
  • I certainly don't mind it being an answer, it was a most interesting read and actually points toward a flaw in my own answer: I had blindly assumed that `std::tuple` would be layed out in order which is unfortunately not portable. – Matthieu M. Sep 24 '13 at 09:26
  • **+1** Interesting read, thanks for sharing. Going to go through the details as soon as I have a bit more time. – lapk Sep 24 '13 at 17:45