Threadsafe, ordered mapping/hash in C++?

Question

What's the best way to implement a threadsafe ordered(note1) mapping/hash in C++? Aka, a fast-lookup data structure (aka, not a queue) that different threads can iterate across, occasionally inserting or deleting elements without interfering with the other threads' activity?

std::map is not threadsafe, and its operations are non-atomic - although only erasure invalidates iterators
wrapping every function in the whole map class doesn't fix the problem - you can have loose iterators out there pointing to a node that gets erased by another thread. It should either lock and prevent the deletion until the current thread is the only one referencing it, or use the UNIX-filesystem-style "dangling but still valid reference upon deletion" approach
tbb::concurrent_hash_map is designed to be threadsafe, but its iterators still invalidate upon deletion of their key. Even if the Value is a smart pointer the data won't get preserved, as the reference in the iterator gets lost.
One can use tbb:concurrent_hash_map by iterating through keys instead of iterators (one can look up a key as O(1) instead of O(log N) like std::map), but since it's a hash, it lacks order-dependent functions like upper_bound, lower_bound, etc. Critically, if a key is left dangling by an erase in another thread, there's no obvious way to tell the code to jump back to the nearest element to that key in the hash.
std::unordered_map has the possibility that one could try to jury-rig finding the nearest element if your key gets deleted via the bucket access functions. But std::unordered_map is not considered threadsafe (although it could potentially be wrapped to be threadsafe). tbb::concurrent_hash_map is threadsafe (subject to the constraints above) but it doesn't allow sufficient bucket access to do this.
std::map, accessed by keys instead of iterators, could find the next nearest element in the map if a key gets deleted by another thread. But every lookup would be O(log N) instead of O(1).
Also, as mentioned, std::map is non-atomic. It would have to be wrapped. Using keys instead of iterators also requires wrapping the whole class because one has to change all of the functions that normally take or return iterators to take or return keys instead.
Someone on one site that I read said that std::map doesn't work well in threaded environments regardless (compared to hashes) because threads don't play nicely with tree rebalancing, although I don't really know why that would be or whether it's even generally applicable/accurate.

What do you think would be best?

** Note1: When I wrote "ordered", I meant that only from the perspective of "has a reliable ordering that one can iterate across", not "the iteration must go across in the order of the keys". After I wrote that, I realized that a couple of my use cases actually do care about the order of iteration (most don't). But either way, I could probably jury-rig a proper ordering by working a linked list into the Value type. Just more slow ugliness and potential for problems/oversights...

** Note2: New thought. Too bad map isn't templated over its iterator type... how hard would it be to change the iterator std::map is built on? I had before been playing around with the idea of having iterators be reference counting (like std::shared_ptr) but I kept mistakenly thinking of implementing the reference counts with a secondary data structure inside the iterators themselves, and that always turned out way too ugly/slow/impractical. But it just occurred to me now that one could include the reference counting inside the Value of the map's key:value pair. That is, every Value would include A) a reference counter (default=0) that each interator increments whenever it reaches it (operator=, operator++, operator--, etc) and decrements whenever it leaves it; and B) an erasure flag (default=false) that the erase function sets. Whenever an iterator decrements the reference counter to zero, if the erasure flag is set then it would then actually erase it.

Seems to me that while it's a performance hit (the extra increments/decrements/checks), it's certainly nothing on the order of having to do a full map lookup every time you want to step through the structure. Can anyone think of a practical way to implement this?

order and hashing is not going to work together - you need to pick one. — Nim, Oct 26 '15 at 21:47
Doesn't have to be a hash, as per the question. Trying to mimic imposed order on a hash table (lock => find the key => get the matching iterator => increment iterator => unlock; in the case that the key has been deleted, search through buckets to mimic upper/lower bound) is discussed as an option. An ugly option. But so far I haven't come up with any pretty options, just several different ugly options. So if you have any better ideas, please let me know :) — KarenRei, Oct 26 '15 at 21:53
@user416650 Your question is going way too broad and vague to achieve concise answers actually. — πάντα ῥεῖ, Oct 26 '15 at 22:06
How big is your map? what other requirements than correct operation do you need? — Surt, Oct 26 '15 at 22:07
@πάντα ῥεῖ - The question is very simple. "How do you do a threadsafe, ordered mapping in C++"? Everything else was just describing the possibilities I've considered so far. — KarenRei, Oct 26 '15 at 22:09
@Nim Why? Assuming sufficient memory and willingness to endure the additional insertion/deletion time, there is nothing preventing you from writing a class that implements both a hash and a linked list for facilitating both fast lookups with a specific key and ordered lookups when you need to iterate. — Olipro, Oct 26 '15 at 22:15
@Surt: I haven't locked it into a particular size yet - it's a cache for a distributed hash table, so it'll take tuning. As per my update to the above, it has to: 1) Map pairs of key->value 2) Be fully threadsafe - a deletion from one thread can't ever cause another thread to crash, even if the other thread trying to work with that key at the time. Insertions likewise can't crash other threads. 3) Ideally have iteration proceed in key-order, so that I don't have to jury-rig that with a linked list or similar in the Value — KarenRei, Oct 26 '15 at 22:15
What's the definition of "crash"? If another thread is working with a key, is it okay if the next function it calls returns an error? If not, you are going to have to require a lock on the entire data structure for any modification. — Warren Dew, Oct 26 '15 at 22:33
@Warren Dew - Errors are fine. Crashes are not. What I need is, as mentioned, either calls to delete a node to block until nobody else is using it, or Values of nodes to left dangling but still valid (like unix files when it's open and being used but someone deletes the file), or iterators to spontaneously advance to the next valid iterator when the iterator they're pointing to becomes invalid or something of that nature. I'm highly flexible as to what approach is taken. — KarenRei, Oct 26 '15 at 22:44
You realize that thread safety is not a compositional property, right? I'm skeptical that what you really want is a thread-safe data map... you'll have race conditions regardless of its thread safety. — user541686, Oct 27 '15 at 00:26
Accessing any memory across threads generally isn't consistent. It's part of the reason you need locks around concurrent data structures. Fundamentally, it's why it's difficult to achieve the [*wait-free*](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) property, which was what you were initially describing. — Jason, Oct 27 '15 at 00:47
@Mehrdad As long as all the read/write code is mutually exclusive, it should be okay. — Jason, Oct 27 '15 at 00:53
@Negrdad Jason It depends entirely on what the task is. This is a ring cache. The elements are largely independent of each other (like a filesystem). It's perfectly acceptable behavior, for example, for a thread who has the key its iterator is pointing deleted to continue reading from/writing to an orphaned Value. It's also fine for it to simply advance to the next element. Or many other possibilities - so long as it doesn't crash. — KarenRei, Oct 27 '15 at 01:15
Perfect concurrency is neither a requirement nor expected. This is for a DHT. A DHT by its very nature is never going to be perfectly synced. Nodes can enter and alert some entities or fail to alert others, A node may be responsive then just suddenly fail to respond without logging off. At any given point in time each element has more that it doesn't know about the network than what it does know. Each node has to decide for itself what the truth is, and the truth is always fluid. — KarenRei, Oct 27 '15 at 01:17
@Olipro, the op is after *a container*, of course you can combine set of containers in any exotic combination, but you are not going to be able to get this out of the box on any existing container (AFAIK.) — Nim, Oct 27 '15 at 08:45
@Nim Getting out-of-the-box thread safety is also likely to be a challenge. I think approaching a programming programming problem with the expectation of not creating your own container for bespoke requirements is unreasonable, hence my point stands. — Olipro, Oct 27 '15 at 20:41

Anton · Answer 1 · 2015-10-28T05:33:13.867

For some reason you missed tbb::concurrent_unordered_map which is the hash table with thread-safe iteration support. It is based on split-ordered list algorithm where elements are connected in container-wide list structure in addition to the hash table and thus the iteration is straight-forward. But it does not fit completely well in your requirements since it does not support concurrent erasure.

This is a fundamental issue that in absence of a memory reclamation mechanism, it is hard to fuse both fast traversal and safe erasure properties in one concurrent data structure at the same time, you have to choose here: safety/consistency or speed.

With certain limitations and caution, you can do both traversing and deletion at the same time as described in this blog. Basically, it says that as long as you can interleave (mutually exclude) traversal and erasure, tbb::concurrent_hash_map can be used for concurrent traversal along with find&insert. The blog suggests additional optimization with double-check pattern. But it can be simplified to the following:

for(iterator = table.begin(); iterator != table.end(); iterator++ ) {
    accessor acc;
    // a key cannot be changed thus it is safe to read it without lock
    table.find( acc, iterator->first );   // now get the get the lock
    if( acc->second.market_for_deletion )
        table.erase( acc );               // erase only by accessor
}

It is essentially similar to your note 2 applied to concurrent_hash_map case since the most overhead goes not from the lookup (for neighbor elements a cache miss is less likely) but from synchronization with two locks (internal bucket's lock and element's accessor).

But if the speed of such a traversal approach is too slow or it is too hacky (relying on implementation details) for you, but you still desperately need to be able to delete elements of the concurrent hash table, consider using a RW-lock like tbb::spin_rw_mutex along with tbb::concurrent_unordered_map. You need to find an optimal place where the read locks can be acquired not too frequently to enable iteration, lookup & insertion without too much overhead and do the erase under the write lock not too often as well. Probably it requires additional scheme to mark and collect enough elements before they will really be deleted. E.g. here is a pseudo-code for such a hash table class:

class concurrent_hash_table_with_erase_and_traverse {
    tbb::concurrent_unordered_map my_map;
    tbb::spin_rw_mutex            my_lock; // acquired as writer for cleanup only
    tbb::atomic<size_t>           my_trash_count; // indicates # of items for erase

public:
    void init_thread_for_concurrent_ops() { my_lock.lock_read(); }
    void release_thread()                 { my_lock.unlock(); } // assuming reader lock
    mapped_type read(key_type k) {
        // assert: under read lock (thread is initialized)
        if(my_trash_count > threshold) {  // time to remove items
            my_lock.unlock(); // release reader
            // waiting all the threads to enter this container
            // TODO: re-implement with try_lock and checking the condition 
            my_lock.lock();   // acquire writer

            if(my_trash_count > threshold) { // double-check
                my_trash_count = 0;
                for( auto it = my_map.begin(); it != my_map.end(); ) {
                    auto _it = it++;
                    if( _it->is_marked_for_erase )
                        my_map.unsafe_erase( _it );
                }
            }
            my_lock.unlock();    // release writer
            my_lock.lock_read(); // acquire reader
        }
        return my_map[k]; // note: access is not protected like in concurrent_hash_map
    }
    void safe_erase(key_type k) {
        // assert: under read lock
        my_map[k].is_marked_for_erase = true;
        my_trash_count++;
    }
};

I checked the blog, where they present two solutions. One is the "do a key lookup every time you advance the iterator" solution, which is of course painfully slow. The other only works with atomics, and is unreliable at that (and my experimenting with tbb makes me question whether even atomics would survive invalidation in their implementation). What sort of locking structure are you picturing that doesn't require locking out entire iterations of the whole data structure? Is there no way to just implement my reference-counting "Note2" above? — KarenRei, Oct 27 '15 at 09:59
@KarenRei, it is not lookup which is costly but there are two spin-locks on the way for the element. So, the most cost will go to atomic operations and it is the same as if iterators do this locking/unlocking automatically with their advance. Because on average, the next element's bucket is located closely to the previous and it will not cause the cache miss, thus it is atomic ops in the locks which are responsible for the most of the overhead — Anton, Oct 28 '15 at 04:19

score 1 · Answer 2 · answered Dec 14 '15 at 23:11

Okay, so this took ages... and so much work that I decided to make a github project out of it ;) But I finally have a threadsafe map class. Complete with a full test suite and lots of configurable options. Hopefully if anyone else needs this they'll make use of it!

https://github.com/KarenRei/safe-map#

Surt · Answer 3 · 2015-10-27T00:13:35.283

Sample thread safe code that is probably not what you really wanted.

Warning untested code

template<typename kType, typename dType>
class Locked {
  std::mutex mut;
  std::map<kType, dType> theMap; // change types as required
public:
  const dType get(const kType& key) const {
    std::lock_guard<std::mutex> g(mut);
    auto it = theMap.find(key);
    if (it != theMap.end())
      return *it;
    // throw or return buggy dType
    return dType(-1); // or whatever
  }
  void set(const kType& key, const dType& data) {
    std::lock_guard<std::mutex> g(mut);
    theMap[key] = data;
  }
  void delete(const kType& key) {
    std::lock_guard<std::mutex> g(mut);
    auto it = theMap.find(key);
    if (it != theMap.end()) {
      theMap.erase(it);
      return;
    }
    // throw?
  }
}

I don't think this will work if dType is a pointer unless its a shared_ptr.

There can be only one.

It can be extended to have a reader counter so only set/delete blocks, set can be a read on the tbb map as it allows for thread safe insertion.

C++14 has std::shared_timed_mutex which makes the reads a bit easier on performance.

C++17 has std::shared_mutex which remove the time element from it.

Now there are loads of different lock-free, wait-free etc. implementations to get around the performance problems.

Depending on the actual load some spin_lock instead of the mutexes might help, up to a certain point.

That's one of the possibilities I mentioned. The actual implementation would have to be far more complicated of course - there's a lot of important map functions not wrapped here (including everything related to iterating), there's a distinction between a failed get because it's at the end vs. because the key was deleted, and about 50 other issues. But it is a possibility. Complex and slow (lookups are O(log N)), but probably the best of the bad options I thought of earlier. — KarenRei, Oct 26 '15 at 23:58
@KarenRei, updated the answer with better performing alternatives. Feel free to exchange map with whatever you feel like. — Surt, Oct 27 '15 at 00:14
Hmm... I'm not seeing what you've changed in the code. Or do you mean what you wrote after the code? The mutexes aren't the time-eating portion , the fact that every "++iter" requires an O(log N) lookup is. — KarenRei, Oct 27 '15 at 01:24

Threadsafe, ordered mapping/hash in C++?

3 Answers3