Dynamic TLS in C++11

Question

I'm writing a C++11 class Foo, and I want to give each instance its own thread-local storage of type Bar. That is to say, I want one Bar to be allocated per thread and per Foo instance.

If I were using pthreads, Foo would have a nonstatic member of type pthread_key_t, which Foo's constructor would initialize with pthread_key_create() and Foo's destructor would free with pthread_key_delete(). Or if I were writing for Microsoft Windows only, I could do something similar with TlsAlloc() and TlsFree(). Or if I were using Boost.Thread, Foo would have a nonstatic member of type boost::thread_specific_ptr.

In reality, however, I am trying to write portable C++11. C++11's thread_local keyword does not apply to nonstatic data members. So it's fine if you want one Bar per thread, but not if you want one Bar per thread per Foo.

So as far as I can tell, I need to define a thread-local map from Foos to Bars, and then deal with the question of how to clean up appropriately whenever a Foo is destroyed. But before I undertake that, I'm posting here in the hope that someone will stop me and say "There's an easier way."

(Btw, the reason I'm not using either pthread_key_create() or boost::thread_specific_ptr is because, if I understand correctly, they assume that all threads will be spawned using pthreads or Boost.Thread respectively. I don't want to make any assumptions about how the users of my code will spawn threads.)

I'm starting to think the best approach is to use boost::thread_specific_ptr after all, but I have [some concerns](http://stackoverflow.com/questions/22448022/using-boostthread-specific-ptr-in-a-non-boost-thread). — slyqualin, Mar 17 '14 at 06:57
Would it be sufficient to write a small allocator that returns pointers into a thread_local std::list or something similar? Then you pass this allocator to a std::shared_ptr which is a data member of Foo. Technically, the storage would ultimately come from non-TLS areas of course. C++ doesn't have a way to allocate a dynamic amount of TLS, but you could also have a thread_local raw pointer to memory inside the allocator. — jared_schmitz, Mar 21 '14 at 03:14
@jared_schmitz writes `Then you pass this allocator to a std::shared_ptr which is a data member of Foo.` So Foo has just one such data member, yes? If I understand correctly, that means that for each Foo there is only one Bar (which corresponds to an entry in a list that is local to the thread in which the Foo was created). But I want to have N Bars per Foo, where N is the number of threads referring to the Foo. Of course I may well be misunderstanding, so please set me straight. :) — slyqualin, Mar 21 '14 at 04:22
Ah I misinterpreted your question then. Let me try and cook up some code and put it in a proper answer. — jared_schmitz, Mar 22 '14 at 02:19
I actually need clarification. You say in the question that the ctor/dtor are responsible for handling the TLS, but in your most recent comment you say "N is the number of threads referring to the Foo". Do you mean N is actually the number of threads that have a handle to an instance of Foo? Or is N just the number of threads in existence, and constant across all Foo instances? — jared_schmitz, Mar 22 '14 at 02:38
Sorry, that "number of threads referring" remark was sloppy. What really happens is that, once a Foo f has been created, then user code in any thread can call f.get_bar(), which gets the Bar object that is specific to f and also specific to the thread making that call -- and there can be multiple threads making such calls on the same f. — slyqualin, Mar 22 '14 at 06:44
So for clarification: if I were in a purely posix world, Foo's ctor/dtor would call pthread_key_create() and pthread_key_delete(), and Foo::get_bar() would call pthread_get_specific() and/or pthread_set_specific(). — slyqualin, Mar 22 '14 at 06:49

score 2 · Answer 1 · answered Mar 22 '14 at 21:18

2

You would like Foo to contain a thread_local variable of type Bar. Since, as noted, thread_local cannot apply to a data member, we have to do something more indirect. The underlying behavior will be for N instances of Bar to exist for each instance of Foo, where N is the number of threads in existence.

Here is a somewhat inefficient way of doing it. With more code, it could be made faster. Basically, each Foo will contain a TLS map.

#include <unordered_map>

class Bar { ... };

class Foo {
private:
  static thread_local std::unordered_map<Foo*, Bar> tls;
public:    
  // All internal member functions must use this too.
  Bar *get_bar() {
    auto I = tls.find(this);
    if (I != tls.end())
      return &I->second;
    auto II = tls.emplace(this, Bar()); // Could use std::piecewise_construct here...
    return &II->second.second;
  }
};

answered Mar 22 '14 at 21:18

jared_schmitz

583
2
10

To quote from my original question: `So as far as I can tell, I need to define a thread-local map from Foos to Bars, and then deal with the question of how to clean up appropriately whenever a Foo is destroyed. But before I undertake that, I'm posting here in the hope that someone will stop me and say "There's an easier way."` So I agree that your suggestion could be made to work, but I was hoping to find something easier. See also [here](http://stackoverflow.com/questions/22448022/using-boostthread-specific-ptr-in-a-non-boost-thread). – slyqualin Mar 23 '14 at 04:02
(Btw, if I was going to build a map, it might be at least equally easy to do it the other way around, i.e. don't use any ready-made TLS mechanism, and instead let Foo contain a nonstatic map from thread IDs to Bars, along with a mutex to protect it. Of course, that would just be yet another reinvention of TLS ...) – slyqualin Mar 23 '14 at 05:13
It's likely that the compiler implementation would use a better locking scheme, so it should be faster to use `thread_local`. Btw, I believe that by templating this and putting it behind a pointer-like interface would be exactly the behavior of `boost::thread_specific_ptr`, but with C++ threads. Also, with respect to not making assumptions about how users spawn threads, unless you use the `thread_local keyword`, I don't think a hand-rolled TLS implementation is guaranteed to work. – jared_schmitz Mar 23 '14 at 21:19
You write `I believe that by templating this and putting it behind a pointer-like interface would be exactly the behavior of boost::thread_specific_ptr`. Before you can make that claim, I think you need to write Foo's destructor, which has to remove `this` from `Foo::tls` for all threads. (Following your analogy, that corresponds to what boost::thread_specific_ptr's destructor does.) – slyqualin Mar 24 '14 at 00:38
... and, to be fair, there's a corresponding problem for my suggestion of building the map the other way around, (i.e. a per-Foo map from thread IDs to Bars, rather than a per-thread map from Foos to Bars); i.e. I have to detect when a thread exits, and delete its map entry. I suspect that is the underlying reason why boost::thread_specific_ptr is limited to boost threads, or threads that call `boost::on_thread_exit()` before they die. (See [here](http://stackoverflow.com/questions/22448022/using-boostthread-specific-ptr-in-a-non-boost-thread).) – slyqualin Mar 24 '14 at 00:47
Hm, then I don't believe it's possible to do without knowing how the users spawn the threads. You need to take action precisely when a thread is created or destroyed. `boost::thread_specific_ptr` specifically hooks into Win32 or pthreads. You also have to solve the inverse problem of deleting _all_ related TLS data for a particular Foo, where the deletion occurs in one thread. I don't believe that the Boost solution does this. The destructor for `boost::thread_specific_ptr` deletes the TLS for the calling thread, but can only be called once. The rest are deleted at thread-exit. – jared_schmitz Mar 24 '14 at 01:10
`boost::thread_specific_ptr specifically hooks into Win32 or pthreads.` I didn't know that. Do you have a reference? And I wonder why it chose Win32 and pthreads. Could it be that most C++11 thread implementations use those two also? The reason I ask all this is because, if boost::pthread_specific_ptr is known to work on a wide enough set of platforms, and I can rely on it, then it is a solution to my practical problem. (Of course I'll have to advertise the list of platforms to my users, but that's okay as long as the list covers just about everybody.) – slyqualin Mar 24 '14 at 02:37
I don't have a reference beyond some #ifdefs which error out if either Win32 threads or POSIX threads aren't available, in boost/thread/detail/. C++11 threads will almost certainly be backended to these for desktop environments, because it covers Win32 and POSIX platforms (really OS X and the rest of the traditional *nix platforms). In short, Boost and the standard C++ library implementations will very likely be using the same OS thread interface, so using Boost should be OK. – jared_schmitz Mar 24 '14 at 17:50

Dynamic TLS in C++11

1 Answers1

Linked