If I have a C++ program with OpenMP parallelization, where different threads constantly use some small shared array only for reading data from it, does false sharing occur in this case? in other words, is false sharing related only to memory write operations, or it can also happen with memory read operations.
-
afaik false sharing is about unecessary reloads of data that didnt change because it is on the same page as data that did change. If nothing is written then there is no need to reload anything, thus no false sharing – 463035818_is_not_an_ai Jul 06 '17 at 09:29
-
see here: https://en.wikipedia.org/wiki/False_sharing – 463035818_is_not_an_ai Jul 06 '17 at 09:30
-
I was having in mind the false sharing case, when several threads are working with the same cache line and the processor has to synchronize them in order to mantain coherency, as described here http://www.drdobbs.com/architecture-and-design/sharing-is-the-root-of-all-contention/214100002 – John Smith Jul 06 '17 at 10:21
2 Answers
Typically used cache coherence protocols, such as MESI (modified, exclusive, shared, invalid), have a specific state for cache lines called "shared". Cache lines are in this state if they are read by multiple processors. Each processor then has a copy of the cache line and can safely read from it without false-sharing. On a write, all processors are informed to invalidate the cache line which is the main cause for false-sharing

- 198
- 8
-
does that mean that there will be no performance gain if I create local copies of this array for each thread? – John Smith Jul 06 '17 at 10:24
-
1It depends on the actual underlying hardware, but I think there will be no performance gain. I suggest to write a micro benchmark – max Jul 06 '17 at 11:03
-
1I've thought a little about it. Copying the data might probably even worse because of a higher cache usage. Mostly only the small L1 cache is core private. With inclusive caches you have redundant data in higher level caches. – max Jul 07 '17 at 07:40
-
I've run benchmarking (calculating convolution of a 2D array with small gaussian kernel, where in one version gaussian kernel was created inside each thread and in the other version it was shared). In my case the difference is too small, so that sometimes one version seems a little bit better, but another run shows that the opposite. – John Smith Jul 07 '17 at 09:25
False sharing is a performance issue because it causes additional movement of a cache line which takes time. When two variables which are not really shared reside in the same line and separate threads update each of them, the line has to bounce around the machine which increases the latency of each access. In this case if the variables were in separate lines each thread would keep a locally modified copy of "its" line and no data movement would be required.
However, if you are not updating a line, then no data movement is necessary and there is no performance impact from the sharing beyond the fact that you might have been able to have data each thread does need in there, rather than data it doesn't. That is a small, second order, effect. though. So unless you know you are cache capacity limited ignore it!

- 2,409
- 1
- 11
- 20