3
static char const err_msg[] = "Hell has frozen over.";

For a while I thought it was fine to share const variables like the one above between threads, but then it occurred to me that unless such variables both start and end exactly on a cacheline boundary, any adjacent non-const data could cause false sharing, leading to any of the performance penalties that tends to entail.

Whether that concern is valid would -I assume- depend on how the C language and/or compilers determine where space is allocated for static(const) variables; but nonetheless, to minimize the chances of false sharing I guess it's best to declare all static variables as thread_local in a multi-threading context, even if they are const:

thread_local static char const err_msg[] = "Hell has frozen over.";

Can you corroborate this?

@MichaelDorgan mentioned there are platforms where "there is an additional cost associated with accessing thread local variable and a limit on how many can be declared". Any references corroborating that could affect my assumptions above.

@JonathanLeffler mentioned that const variables tend be laid out in read-only memory regions, which would eliminate false sharing concerns. A follow-up question in that regard would then be: is this strictly platform dependent, or are there stronger guarantees available?

Will
  • 2,014
  • 2
  • 19
  • 42
  • 2
    On systems I work on, there is an additional cost associated with accessing thread local variable and a limit on how many can be declared. I think you may be reaching a bit with your solution here. If you determine through careful profiling that a group of strings are hurting you, by all means group them together for improved locality. Better, use LTO/PGO to let the compiler help you a bit with this. Your solution to me look troubling, but perhaps others can prove me wrong. – Michael Dorgan Jun 27 '19 at 23:28
  • @MichaelDorgan Interesting! It would be quite valuable to me to see a reference for the cost and limit of thread locals you speak of, because that's an aspect I'd say I'm not sufficiently familiar with yet. – Will Jun 27 '19 at 23:38
  • 1
    OK; I don't think I know what you mean by _false sharing_ — it sounds like a false problem, but in the days of Meltdown or Spectre, etc, maybe it isn't. OTOH, if the data is constant, I'm not sure what you might falsely share. I'll delete my comment since it isn't helping you. This will vanish shortly too. Note that const data tends to be stored in a readonly segment; there is no adjacent modifiable data — at least, not in general. – Jonathan Leffler Jun 27 '19 at 23:52
  • Those non-const variables are not going to be adjacent to your const variables — they're typically in different segments (text segment vs data segment) of your program. – Jonathan Leffler Jun 27 '19 at 23:56
  • This sounds like: should I make all variables static in case I return a pointer to them? – KamilCuk Jun 27 '19 at 23:56
  • @JonathanLeffler *Note that const data tends to be stored in a readonly segment* <- that's exactly the kind of thing I'd like to see a reference/assurance of in a potential *answer*. ;) – Will Jun 27 '19 at 23:57
  • @KamilCuk How on earth did you make that leap? – Will Jun 27 '19 at 23:59
  • Unless you go tinkering with the loader layout, that separation happens on mainstream server, desktop, laptop systems. I don't have references on hand; you can look them up as well as I can (or, at least, you have the same search engines available to you as I have — I might get there quicker if I tried, but that's definitely not a given). You can study the layout of your program — print addresses of variables, etc. But I'm moderately confident that you'll be OK with no _false sharing_ problems. – Jonathan Leffler Jun 28 '19 at 00:00
  • The link to Wikipedia's article, which has a notice about more citations needed, is helpful. There's a big difference between non-const and const-qualified data here. What the example shows is different from what's in your question. If you have (non-const) data per thread, it could be allocated with `malloc()` or you could use mutexes to control access to it, or you could use thread-local storage. TLS has to be set up when a thread is created. It will usually be a copy of the TLS in the thread that creates the new thread. Variables on the stack are inherently thread-local and more efficient. – Jonathan Leffler Jun 28 '19 at 00:21
  • 2
    False sharing happens when *writing* disables a remote cache line. You cannot *write* to const. The programming environment forbids it at compile time and the system at run-time. So this cannot happen. – Alain Merigot Jun 28 '19 at 00:27
  • @AlainMerigot I believe I chose my words carefully: "any **adjacent** non-const data could cause false sharing". *Obviously* you cannot write to a const variable. However, lets assume the existence of 2 adjacent-in-memory variables: 8-byte const variable CON, and 8-byte non-const variable VAR. Being only 16 bytes combined, they end up sharing the same 64 byte cacheline CL. Both thread 1 and 2 have CL in their CPU cache because they reference CON. Now thread 1 writes to VAR. Result: false sharing of VAR with thread 2! – Will Jun 28 '19 at 00:43
  • 2
    The const data would be in read-only pages on for example linux. There would not be writable data on the same page. This only applies to static storage duration – Antti Haapala -- Слава Україні Jun 28 '19 at 01:39
  • According to this: https://software.intel.com/en-us/blogs/2011/05/02/the-hidden-performance-cost-of-accessing-thread-local-variables the cost should be relatively low. Regardless, if you are worried, about false sharing why not adding some padding to fill completely the last cache line? alignas(std::hardware_destructive_interference_size) in C++17 should be enough. – CuriouslyRecurringThoughts Jun 28 '19 at 05:33
  • 1
    This sounds like premature optimization to me. In my experience the effects of false sharing and setting up a thread local storage can be at least one order of magnitude appart. Even the implementation of thread local storage would be ideal, it generally requires one additional indirection, because there is this thread dependency that has to be resolved. – Jens Gustedt Jun 28 '19 at 07:43
  • Towards TLS cost: Imagine that you want to access some const data. That's easy - just load the pointer to the data and read. Imagine this is TLS data and that you only have a limited number of slots (not uncommon). Now, you need to find which TLS slot your data is on, dereference that base pointer. Now, go and add your offset to the pointer you loaded and dereference again and start reading your string. This adds at least 1 extra indirection to your work. I can think of cases where additional indrections would be needed as well. Just read your string from .text and pad if needed. – Michael Dorgan Jun 28 '19 at 19:02

0 Answers0