c++11 thread_local keyword support in visual studio 11

Question

So there's a list of c++11 features supported by visual studio.

thread_local support is marked as partial. I was unable to find an explanation of what exactly partial support means here. Did they just alias __declspec(thread)?

I could just use boost::thread_specific_ptr, but there seem to be some reports that boost::thread_specific_ptr is slow. That may or may not be true.

Specifically I want fast TLS on x86/x64 on the most recent linux+gcc and windows+msvc. Fast meaning no system calls where possible (I think this is possible for the platforms above.)

The Visual C++ 11 Developer Preview does not support the `thread_local` keyword. — James McNellis, Jan 07 '12 at 20:18
My understanding is that 'partial' here means the semantics are supported but not through the standard syntax/keyword. — ildjarn, Jan 07 '12 at 20:22
@ybungalobill Either GNU, POSIX, Microsoft, SUN, IBM, and the C++ std committee all provided facilities for something that is not needed in well designed software or your understanding is flawed. But I would like to hear your argument. — Eloff, Jan 07 '12 at 21:41
@ybungalobill: Naw, TLS can be useful. You are correct in the sense that TLS is essentially another global, but globals have their use. — GManNickG, Jan 07 '12 at 21:54
@GMan Eloff: when TLS is used? One case is `errno`, `GetLastError` - these are just another way of returning error codes. It may be more convenient than returning it directly, but I'm absolutely against it in general. I don't want each library to add one word of storage in each my thread that may not even use the library. Another case: contexts like OpenGL rendering context. One may argue that it's convenient to set the context once and then assume that it's global, but then try rendering to multiple contexts from one thread, or design a OOP wrapper for the context to see why it's flawed. — Yakov Galka, Jan 07 '12 at 22:07
@ybungalobill: None of that has to do with TLS, those are just arguments against globals in general. That's fine, but that's not the point here. Take *good* uses for globals and then argue that TLS will never improve it, and is therefore unnecessary in a *good* program. — GManNickG, Jan 07 '12 at 22:12
@GMan: I'd never seen *good* uses of globals. Well, except constant data, but constant data need not be TLS. — Yakov Galka, Jan 07 '12 at 22:19
@GMan: I disagree on `std::cout`. Assuming you mean that memory manager is 'the heap', then again: you may have more than one heap. Think why STL likes allocators. 'OS' is 'constant data' from the point of view of your program, i.e. the OS doesn't change while your program is running. — Yakov Galka, Jan 07 '12 at 22:26
@ybungalobill: I meant in implementation of the OS. But in the case of memory managers, more than one has nothing to do with global or not. You can have multiple global heaps. How would you implement `std::cout`? — GManNickG, Jan 07 '12 at 22:39
@GMan: OK, things that are physically global may be global (e.g. hardware registers and physical RAM), but they're irrelevant for TLS. `std::cout` has formatting state, so it's a mutable object. I would prefer something like `std::ostream local_cout(standard_output);` where `standard_output` doesn't change during process execution. — Yakov Galka, Jan 07 '12 at 22:55
@ybungalobill globals have their uses. So does TLS. For example what if you want to generate random numbers. Using local RNGs is expensive and also a potentially bad idea depending on the seed. It makes sense to use a global RNG. But it's not thread-safe. Enter TLS, then you have a RNG per thread. The same applies if you want a pooled allocator for a certain type of object, you wouldn't make it a local because you clearly want to share it. You could make it thread-safe by making a pool per thread (as long as you free objects on the same thread you allocate them on) — Eloff, Jan 07 '12 at 23:10
@ybungalobill: Interesting idea, I like it. :) That said, there's still a global aspect, and sometimes global things are easier. Consider a log file, for example. — GManNickG, Jan 07 '12 at 23:36
The value of thread local storage is so that things that would otherwise be global for the whole program can be made less global. For example errno used to be a global and that meant that multithreaded programs couldn't do checks for errors signaled with errno reliably. TLS > globals. Global variables should really be thread local and you should have to add a special keyword to get program global variables. — bames53, Jan 08 '12 at 07:35
@ybungalobill: TLS can be important in a few cases. I use it in my library because I need some information to be persisted across library calls on a per-thread basis, and because it is a library, I don't control the thread's stack, and can't very well mandate that the caller should place some particular object on their stack. I don't disagree that TLS is *usually* something best avoided, but it's there for a reason. Or to put it another way, well designed *programs* might not need TLS, but well designed *libraries* sometimes do. — jalf, Jan 08 '12 at 15:48
@jalf: I don't see why you can't mandate the caller create a context that is passed to your library. I.e. `YourLibContext lib; lib.JumpFromTower();`. This gives you even more flexibility and will have one less indirection per access compared to TLS. **Ellof:** what? How local RNGs are more expensive? RNGs are a good example when you *don't* want neither globals nor TLS. — Yakov Galka, Jan 08 '12 at 16:06
@ybungalobill: because that would make it harder to use, more error-prone, and harder to retrofit into existing code. I did experiment with that approach, but it would've become a pain to use :) — jalf, Jan 08 '12 at 16:16
@jalf: It is certainly *more verbose* and thus may cause pain in *simple* cases. But it's no-way more error-prone. I would sum it up as: "let the caller decide where to allocate the context because he knows better" versus "allocate the context automatically leaving no room for customization and force half the threads to pay for what they don't use". — Yakov Galka, Jan 08 '12 at 16:41
@ybungalobill: given that in my case, the context is absolutely an implementation-detail, and it's pretty important that you have precisely one context per thread accessing my library, I disagree. Having two contexts in the same thread would break the semantics of the library. I did think this through quite carefully. ;) — jalf, Jan 08 '12 at 16:45
The library is to handle synchronization between threads. Imagine if you had to pass around *the same* context everywhere in order to use locks or mutexes. Making that context accessible through callbacks would be an utter pain. And if you ever create more than one context in a single thread, it's an error, and the synchronization won't behave as you'd expect. I would love to avoid all forms of globals, but in my scenario, it's really the only approach that works — jalf, Jan 08 '12 at 16:49
@ybungalobill no, some RNGs have considerable space overhead, clearly that's stupid to allocate and seed that if you only plan to generate one number inside a function. Allocating a RNG higher up the call stack and passing it through each function is a poor design IMHO. But the real problem with local RNGs is that you must choose the seed carefully. Create one inside the current function, seed them with the current time (as is typical), then generate a few numbers and return. Call that function inside a tight loop and you will get the exact same seed (and numbers) for a large run of calls! — Eloff, Jan 10 '12 at 20:44
@jalf: OK, in your case you indeed need one instance per thread *by definition of the problem*. It's the same reason you have, e.g. one stack per thread. These are problems that directly bound to the *instruction pointer*. I'm talking about things that there's no reason to bind their lifetime to the thread but people still think it's 'nice to do so'. E.g. random number generators. No **Eloff**. Your argument is still invalid. I didn't say they don't have an overhead. Neither I said creating them each time you want to draw a number. Nor to pass it as an explicit parameter. — Yakov Galka, Jan 10 '12 at 21:44
@Eloff: ... You just have to analyze your design more carefully. Decide *what is the lifetime of the RNG you need*, and bind it to the object of said lifetime. Are you writing a game? No problem, the RNG will be a member of the game state and passed everywhere indirectly through the pointer to the game. Is it a Monte Carlo simulation running in multiple threads in parallel? No problem, initialize one RNG in the object that encapsulates the simulation. — Yakov Galka, Jan 10 '12 at 21:44
Benefits: 1) *no redundant copies are created* for threads that don't use it. 2) you may run *two* instances of the simulation in *one* thread (alternating, if supported) and the result won't depend on the order of alternation. 3) you can serialize the simulation and continue from where you stopped even on another machine with another number of threads. 4) you can unit-test your code. Cons: of course if your case is a simple homework assignment to print a list of random words, *then this design is not for you*, and even plain old `rand()` suits your needs just find. — Yakov Galka, Jan 10 '12 at 21:45

score 7 · Accepted Answer · answered Jan 07 '12 at 21:51

7

So I did some digging into thread_local semantics. gcc's __thread and msvc's __declspec(thread) have the same semantics as each other and thread_local (dynamic initialization aside, which may or may not have made it into the standard yet.) So this is really a non-issue for my use case. I'll just make a define that aliases one or the other platform specific attribute.

answered Jan 07 '12 at 21:51

Eloff

20,828
17
83
112

11

Sadly, the problem with these mechanisms is that they don't support non-POD types. When a thread is terminated, I want its TLS objects to have their destructors called. Neither `__thread` or `__declspec(thread)` can handle that. BUt if you don't need that, this approach should work fine – jalf Jan 08 '12 at 15:49
1

You don't get non-trivial construction/destruction for free anyway, so if you need that there are more performant (no it's not a word, but it should be) mechanisms. I went with a __thread context* and then allocated the context on the stack in the thread start method and set the tls context* to point to it. Then I get proper construction/destruction and access to it should be almost as fast as is possible. – Eloff Jan 10 '12 at 20:58
@Eloff If it's "allocated ... on the stack" anyway, then `thread_local` makes no sense, because each thread has a separate "stack". – Top-Master Jan 04 '23 at 18:17

c++11 thread_local keyword support in visual studio 11

1 Answers1