2

I'm working with C++11 on a project and here is a function:

void task1(int* res) {
    *res = 1;
}

void task2(int* res) {
    *res = 2;
}

void func() {
    std::vector<int> res(2, 0); // {0, 0}
    std::thread t1(task1, &res[0]);
    std::thread t2(task2, &res[1]);
    t1.join();
    t2.join();
    return res[0] + res[1];
}

The function is just like that. You see there is a std::vector, which store all of the results of the threads.

My question is: can std::vector cause false sharing? If it can, is there any method to avoid false sharing while using std::vector to store the results of threads?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Yves
  • 11,597
  • 17
  • 83
  • 180
  • Yes, and you would need each element to be one its own cache line. You could create an allocator to decide how memory should be allocated. – NathanOliver Jan 11 '22 at 03:18
  • 2
    Undefined behaviour occurs as a result of unsequenced operations that need to be sequenced. In your code, as it stands, the two tasks do not access the same object (`int`), so there are no unsequenced operations that need to be sequenced and the only way for false sharing (e.g. incoherency between cache lines that affects program behaviour if the two tasks run on distinct CPUs) would be a compiler bug. If you change the code so (say) the two tasks modify the *same* `int` (e.g. `task1()` does `res[1] = 5`) then all bets are off. – Peter Jan 11 '22 at 03:24
  • @Peter I kind of don't understand. So you are saying that in my piece of code, the compiler should be smart enought to avoid false sharing, if not, it would be a compiler bug? – Yves Jan 11 '22 at 03:55
  • 5
    @Peter: I disagree with your previous comment. [False sharing](https://en.wikipedia.org/wiki/False_sharing) occurs if several threads write to the same cache line, so that the CPU must assume that sharing has occured, even though it hasn't. This is likely what happens in OP's code. I strongly disagree with your statement that a "compiler bug" would be necessary for false sharing to occur in OP's code. – Andreas Wenzel Jan 11 '22 at 05:04
  • 2
    @Peter: The C++ memory model more or less requires that it run on a machine with coherent caches between cores. As long as the two objects are distinct, it must be possible to access them concurrently and have everything work, without data races - even if they happen to be in the same cache line. And on a coherent cache machine, having the two ints share a cache line is only a problem for performance (cache ping pong) but not for correctness. But it's the performance hit that OP wants to avoid. – Nate Eldredge Jan 11 '22 at 05:33
  • 2
    @Peter: You're right that on a machine without coherent caches, the compiler would have to put every object, no matter how small, in a separate cache line, and failing to do so would be a bug. But this would be incredibly wasteful of memory and so we are not likely to run C++ programs on such a machine. – Nate Eldredge Jan 11 '22 at 05:35

2 Answers2

6

can std::vector cause false sharing?

Containers aren't something that "cause" false sharing. It's writing to objects that may cause false sharing. Specifically, writing in one thread to an object that is in the same "cache line" as another object that is accessed in another thread causes false sharing.

Elements of an array are adjacent in memory and hence adjacent small elements of an array are very likely in the same cache line. Vector is an array based data structure. The pattern of accessing the elements of the vector in your example are a good example of false sharing.

is there any method to avoid false sharing while using std::vector to store the results of threads?

Don't write into adjacent small elements of an array (or a vector) from multiple threads. Ways to avoid it are:

  • Divide the array into contiguous segments and only access any individual segment from a separate thread. The size of the partition must be at least the size of the cache line on the target system.
  • Or, write into separate containers, and merge them after the threads have finished.
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • So, if I use two seperate containers, they won't cause false sharing even if they are declared together in the same function? e.g. `std::vector sub1(...), sub2(...);`. In this case, multi-threading writing into `sub1` and `sub2` won't cause false sharing even if they are declared together? – Yves Jan 11 '22 at 03:51
  • @Yves If you were to do something that modifies the vector object itself such as push_back, then false sharing would be possible since those vectors would likely be in the same cache line. But the elements themselves wouldn't be, so modifying them wouldn't be a problem. – eerorika Jan 11 '22 at 03:56
  • 1
    I am not sure if it is safe to rely on the contents of two `std::vector` containers to be on different cache lines. For example, according to [this link](https://prog21.dadgum.com/179.html), the "smallest allowed allocation" of dlmalloc on a 64-bit system is 32 bytes, which is less than the size of a cache line on most platforms. Therefore, depending on the implementation, it seems quite possible for them to be on the same cache line. – Andreas Wenzel Jan 11 '22 at 04:51
4

Yes, if you write to two adjacent int elements inside a std::vector, it is likely that they will both be on the same cache line, which will cause false sharing if this cache line is accessed simultaneously by two different threads.

C++17 introduced std::hardware_destructive_interference_size, which is a portable way to get a hint from the compiler on what the L1 cache line size is expected to be on the target platform.

Therefore, to prevent false sharing, you should ensure that the two int variables are at least std::hardware_destructive_interference_size bytes apart:

void func() {

    constexpr int min_offset = std::hardware_destructive_interference_size / sizeof(int);

    std::vector<int> res( min_offset + 1, 0 );
    std::thread t1( task1, &res[0] );
    std::thread t2( task2, &res[min_offset] );
    t1.join();
    t2.join();
    return res[0] + res[min_offset];
}

However, at the time of this writing, several compilers do not (yet) support std::hardware_destructive_interference_size. See this question for further information.

If you want to be reasonably certain that your code will not have false-sharing in the distant future, then you may want to assume that the cache size is double the size reported by std::hardware_destructive_interference_size.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
  • This seems overly pessimistic. Shouldn’t the number of elements in the vector be based on `hardware_destructive_interference_size / sizeof (int)`? +1. – Pete Becker Jan 11 '22 at 03:57
  • Not L1 cache size, but L1 cache **line** size. Also, note that neither libstdc++ nor libc++ have implemented it. – eerorika Jan 11 '22 at 04:02
  • @PeteBecker: Yes, you are right. I forgot to divide by `sizeof(int)`. I believe that I have fixed it now. – Andreas Wenzel Jan 11 '22 at 04:03
  • @eerorika: Yes, you are right. I intended to write "cache line size". I have fixed it now. – Andreas Wenzel Jan 11 '22 at 04:06
  • 3
    Another option is to wrap your `int` in a struct type defined with `alignas(std::hardware_destructive_interference_size)`, and have a vector of those instead. Then each one gets a whole cache line to itself. In effect you have a vector of cache lines. – Nate Eldredge Jan 11 '22 at 05:24