How can map operations in one thread be done safely, without invalidate iterators in another?

Question

I have two threads operating on a shared map. One thread (named thread 1) just keeps inserting pairs into the map. The other thread (named thread 2) keeps getting the map's first element, does some operation on the element, and finally erases it from the map. It doesn't matter to thread 2 whether the element it operates on happens to be at the map's beginning after thread 1 inserts an element. However, the map element in thread 2 must not be invalidated by thread 1's insert.

I know that STL containers are not thread-safe on their own, but they can still be quite useful if properly adapted. So my approach is every time thread 2 gets an element, I make a copy of the data and do my work. Also, map methods are made atomic by using a lock_guard on a mutex stored as a member.

pseudocode like below under c++17

my_map {
    insert(value_type value){
        lock_guard(mutex)
        map.insert(value)
    }
    erase(iterator position){
        lock_guard(mutex)
        map.erase(position)
    }
    end(){
        lock_guard(mutex)
        map.end()
    }
    begin(){
        lock_guard(mutex)
        map.begin()
    }
}

thread 1:

while(1){
    sleep(1)
    my_map.insert(random())
}

thread 2:

while(!my_map.empty()){
    auto it = my_map.begin()
    auto k = it->first;
    auto v = it->second;
    work(k, v);
    my_map.erase(it);
}

The locks should prevent the map itself from becoming invalid due to simultaneous mutation. My concern is how map operations might affect iterators and elements of the map. Consider this sequence:

What's the behavior if the sequence in the picture happens? That is to say, might the insertion in thread 1 invalidate it, k or v in thread 2 in some way (such as disturbing the iterator so k and v don't correspond, or destroying the iterator so k or v can't be retrieved)?

I was told using a copy of the map is thread-safe but it wouldn't be ok with the iterator, so it is correct? And how do I make the copy of one element in the map or some other STL containers to achieve thread safety?

Could you show us some code for the scenario you are asking about? — Daniel Langr, Nov 10 '21 at 15:12
Please read [ask] with a [mcve]. Please post all necessary code as formatted text in the question. Links can expire and make the question useless for future readers. Images can not be copied and compiled into a working example / test case. — Richard Critten, Nov 10 '21 at 15:15
@RichardCritten Well, I post the address of my another question.. I just edit this page and paste the psuedocode — f1msch, Nov 10 '21 at 15:20
Assuming `my_map.empty()` is synchronized similarly to other methods, I believe this scenario is thread-safe. `insert` doesn't invalidate any iterators, and doesn't race with accessing an element through an existing iterator. — Igor Tandetnik, Nov 10 '21 at 17:02
@IgorTandetnik I believe that gives me the answer I want. Thanks a lot — f1msch, Nov 11 '21 at 01:41
For const methods and the map's mutex, check out "[C++ mutex and const correctness](https://stackoverflow.com/q/3239905/90527)". — outis, Dec 28 '21 at 04:29

outis · Answer 1 · 2021-12-28T05:10:05.043

Insertion Affects on Thread 2's Iterators/References

For std::map and other ordered associative containers, insertion won't invalidate container iterators (or element references), so it (and it->first and it-second) in thread 2 shouldn't be affected by thread 1's my_map.insert. (Note: this is entirely due to the standard behavior for containers, not the locks. Though the locks ensure my_map.insert(...) in thread 1 and my_map.begin() and my_map.erase(...) won't overlap, they don't affect the interaction of the insertion in thread 1 and accessing through the iterator in thread 2.) It's a different story for the unordered containers, such as std::unordered_set and std::unordered_map, which will invalidate iterators if they must rehash.

Other Operation Affects on Thread 2's Iterators/References

Since thread 1's insertion won't affect it->first or it->second, making local copies isn't necessary. With the addition of a synchronized my_map::extract method (calling map::extract, added by C++17), the extraction can be made atomic (as long as the element doesn't need to remain in the map while the work is done):

my_map::node_type my_map::extract(iterator position) {
    lock_guard<decltype(my_map::mutex)> lock(mutex);
    return map::extract(position);
}

void thread2() {
    while(! my_map.empty()){
        auto element = my_map.extract(my_map.begin())
        work(element.key(), element.value());
    }
}

If other threads were to remove elements from the map, however, it then would be necessary to ensure the data in thread 2 wasn't affected, as thread 2 could be interrupted between the calls to my_map::begin and my_map::extract, and an erasure might invalidate the passed iterator. That should be the only critical point in thread 2, as finding & removing the element from the map both early and atomically should ensure other threads won't be able to erase the node themselves (though thread 2 could affect iterators and references in other threads, which would need to be addressed in their own critical sections).

Protecting Against Erasure

A copy or extraction would need to be protected by e.g. locks to prevent an erasure from interrupting the operation; my_map has just such a lock. Looking at it from another angle, removing and returning the first element is entirely an operation on my_map, and is thus a candidate to be a method.

In other languages, this operation is called pop or shift. There are some similarly named STL containers methods (pop_back and pop), though they remove an element but don't return it, and map doesn't have such a method. Let's call the method shift:

my_map::node_type my_map::shift() {
    lock_guard<decltype(my_map::mutex)> lock(mutex);
    return map::extract(map::begin());
}

void thread2() {
    while(! my_map.empty()){
        auto element = my_map.shift()
        work(element.key(), element.value());
    }
}

Note there are some caveats about the elements returned by extract:

Pointers and references to the extracted element remain valid, but cannot be used while element is owned by a node handle [node_type]: they become usable if the element is inserted into a container.

See "std::unordered_map::extract references/pointers invalidation" for details, but the gist is you can access the data via the node_type::key() and node_type::value() methods (including assigning to them, which is why there's a caveat). The issue in general would be if other threads had a reference to the same element and it were modified via the node handle (which thread 2 doesn't do, but steps would still need to be taken for safety), it could cause issues in those other threads.

shift above is a bit leaky in its return value. This could be addressed by returning a value type instead:

my_map::value_type my_map::shift() {
    lock_guard<decltype(my_map::mutex)> lock(mutex);
    auto node = map::extract(map::begin());
    return value_type(std::move(node.key()), std::move(node.value()));
}

thread2() {
    while(! my_map.empty()){
        auto element = my_map.shift()
        work(element.first, element.second);
    }
}

Since node_type::key() and node_type::value() return lvalues, std::move in my_map::shift is used to convert these to rvalues, so that the returned pair grabs the data from the node, rather than making copies.

Alternatively, a lock could protect the call to work as a critical section, though this would be less performant if work(...) takes an appreciable amount of time. In general, copying/extracting reduces the critical section of code, so other threads don't need to wait for the real work to complete.

Erasure Affects on Thread 1

As thread 1 doesn't operate on elements of the map (only the map itself), there's nothing for thread 2's my_map.erase(...) to affect in that regard. Even if thread 1 did operate on an iterator or reference to an item from my_map, the erase would only invalidate erased elements (thread 1 would need to protect against this, as described above).

How can map operations in one thread be done safely, without invalidate iterators in another?

1 Answers1

Insertion Affects on Thread 2's Iterators/References

Other Operation Affects on Thread 2's Iterators/References

Protecting Against Erasure

Erasure Affects on Thread 1

Linked