5

I am creating a very fast multi-threaded discrete event simulation framework. The core of the framework uses atomics and lockless programming techniques to achieve very fast execution across many threads. This requires me to align some variables to cache lines and pad the remaining cache line space so that I don't have cache line contention. Here is how I do it:

// compute cache line padding size
constexpr u64 CLPAD(u64 _objSize) {
  return ((_objSize / CACHELINE_SIZE) * CACHELINE_SIZE) +
      (((_objSize % CACHELINE_SIZE) > 0) * CACHELINE_SIZE) -
      _objSize;
}

alignas(CACHELINE_SIZE) MyObject myObj;
char padding[CLPAD(sizeof(myObj))];

This works great for me, but I stumbled upon an issue today when I was using this methodology for a new object type. The CLPAD() function returns the amount of chars needed to pad the input type up to the next cache line. However, if I put in a type that is exactly sized a multiple of number of cache lines, the CLPAD returns 0. If you attempt to create a zero sized array, you get this warning/error:

ISO C++ forbids zero-size array 'padding'

I know I could modify CLPAD() to return CACHELINE_SIZE in this case, but then I'm burning a cache line worth of space for no reason.

How can I make the declaration of 'padding' disappear if CLPAD returns 0?

nic
  • 1,511
  • 2
  • 14
  • 27

1 Answers1

4

Taking a page from std::aligned_storage<>, I've come up with the following:

template<class T, bool = false>
struct padded
{
    using type = struct
    {
        alignas(CACHELINE_SIZE)T myObj;
        char padding[CLPAD(sizeof(T))];
    };
};

template<class T>
struct padded<T, true>
{
    using type = struct
    {
        alignas(CACHELINE_SIZE)T myObj;
    };
};

template<class T>
using padded_t = typename padded<T, (sizeof(T) % CACHELINE_SIZE == 0)>::type;

Usage:

struct alignas(32) my_type_1 { char c[32]; }; // char c[32] to silence MSVC warning
struct my_type_2 { char c[CACHELINE_SIZE * 2]; }; // ditto

int main()
{
    padded_t<my_type_1> pt0;
    padded_t<my_type_2> pt1;

    sizeof(pt0);    // 128
    alignof(pt0);   // 128

    sizeof(pt1);    // 256
    alignof(pt1);   // 128
}

You can provide a function to access myObj however you wish.

user2296177
  • 2,807
  • 1
  • 15
  • 26
  • Any reason to use nested structures? You could just use `padded` without nested `type` (unless this is some metaprogramming trick) – Andrey Turkin May 19 '17 at 07:11
  • @AndreyTurkin Indeed, you could go without a nested type. However the type of `padded<>` contains template parameters, while the type alias provides an anonymous structure that does *not* have template parameters. – user2296177 May 19 '17 at 07:21
  • So what's the gain of doing that? Number of types gets smaller? Symbol table gets smaller? Object size gets smaller? – Andrey Turkin May 19 '17 at 07:29
  • Is the "alignof(T) % CACHELINE_SIZE" supposed to be "sizeof(T) % CACHELINE_SIZE" ? – nic May 19 '17 at 18:36
  • @nic yes, indeed! Since that's what your actual problem describes. Glad you noticed my oversight. That's why I made the size a multiple of the cache line. – user2296177 May 19 '17 at 18:45
  • @AndreyTurkin I'm not experienced enough to answer all of your questions. I do believe it to be cleaner that way and similar to the way the standard does it (since I based my design on aligned storage). Object size will not be smaller/larger. I don't like the idea of carrying around type information that I don't need, but I'm not sure if there's other advantages. – user2296177 May 19 '17 at 18:47