What's the best way to implement a threadsafe ordered(note1) mapping/hash in C++? Aka, a fast-lookup data structure (aka, not a queue) that different threads can iterate across, occasionally inserting or deleting elements without interfering with the other threads' activity?
std::map is not threadsafe, and its operations are non-atomic - although only erasure invalidates iterators
wrapping every function in the whole map class doesn't fix the problem - you can have loose iterators out there pointing to a node that gets erased by another thread. It should either lock and prevent the deletion until the current thread is the only one referencing it, or use the UNIX-filesystem-style "dangling but still valid reference upon deletion" approach
- tbb::concurrent_hash_map is designed to be threadsafe, but its iterators still invalidate upon deletion of their key. Even if the Value is a smart pointer the data won't get preserved, as the reference in the iterator gets lost.
- One can use tbb:concurrent_hash_map by iterating through keys instead of iterators (one can look up a key as O(1) instead of O(log N) like std::map), but since it's a hash, it lacks order-dependent functions like upper_bound, lower_bound, etc. Critically, if a key is left dangling by an erase in another thread, there's no obvious way to tell the code to jump back to the nearest element to that key in the hash.
- std::unordered_map has the possibility that one could try to jury-rig finding the nearest element if your key gets deleted via the bucket access functions. But std::unordered_map is not considered threadsafe (although it could potentially be wrapped to be threadsafe). tbb::concurrent_hash_map is threadsafe (subject to the constraints above) but it doesn't allow sufficient bucket access to do this.
- std::map, accessed by keys instead of iterators, could find the next nearest element in the map if a key gets deleted by another thread. But every lookup would be O(log N) instead of O(1).
- Also, as mentioned, std::map is non-atomic. It would have to be wrapped. Using keys instead of iterators also requires wrapping the whole class because one has to change all of the functions that normally take or return iterators to take or return keys instead.
- Someone on one site that I read said that std::map doesn't work well in threaded environments regardless (compared to hashes) because threads don't play nicely with tree rebalancing, although I don't really know why that would be or whether it's even generally applicable/accurate.
What do you think would be best?
** Note1: When I wrote "ordered", I meant that only from the perspective of "has a reliable ordering that one can iterate across", not "the iteration must go across in the order of the keys". After I wrote that, I realized that a couple of my use cases actually do care about the order of iteration (most don't). But either way, I could probably jury-rig a proper ordering by working a linked list into the Value type. Just more slow ugliness and potential for problems/oversights...
** Note2: New thought. Too bad map isn't templated over its iterator type... how hard would it be to change the iterator std::map is built on? I had before been playing around with the idea of having iterators be reference counting (like std::shared_ptr) but I kept mistakenly thinking of implementing the reference counts with a secondary data structure inside the iterators themselves, and that always turned out way too ugly/slow/impractical. But it just occurred to me now that one could include the reference counting inside the Value of the map's key:value pair. That is, every Value would include A) a reference counter (default=0) that each interator increments whenever it reaches it (operator=, operator++, operator--, etc) and decrements whenever it leaves it; and B) an erasure flag (default=false) that the erase function sets. Whenever an iterator decrements the reference counter to zero, if the erasure flag is set then it would then actually erase it.
Seems to me that while it's a performance hit (the extra increments/decrements/checks), it's certainly nothing on the order of having to do a full map lookup every time you want to step through the structure. Can anyone think of a practical way to implement this?