Iterating over vector in one thread while other may potentially change reference

Question

I have a vector allow_list that is periodically updated in a thread while another serves a function that checks if a certain string is in that allow_list via:

if (std::find(allow_list->begin(), allow_list->end(), target_string) != allow_list->end()){
    allow = true;
}

Now, the other thread may do something like this

// Some operation to a vector called allow_list_updated
allow_list = allow_list_updated;

Should I add a mutex here to lock and unlock before and after these operations? My intuition tells me it's "ok" and shouldn't crash and burn but this seems like undefined behavior to me.

When assigning, the previous contents of the (shared) object get destroyed. As such, any and all iterators will be invalidated, leading to a nice little crash. To test it, you can iterate over the container in one thread and implement a 'sleep', while the other thread manipulates the list. — Refugnic Eternium, Feb 14 '23 at 06:16
find is effectively a loop iterating through the vector. AFAICT, you are setting up a scenario where you potentially replace the entire array managed by allow_list with a new one in the middle of the loop, invalidating all old iterators and releasing the memory of the old one that the now invalid iterators used to access. Seems like "crash and burn" is exactly what this is about. — Avi Berger, Feb 14 '23 at 06:17
I am confused. So what if iterators get invalidated? Every time find is called, it will call begin() and end() on allow_list and the find result will be based on allow_list. — kiner_shah, Feb 14 '23 at 06:24
find is a C++ source code construct. It is not a machine instruction or atomic. It is a short hand for a loop that keeps updating an iterator and checking a condition. If it has 100 elements to iterate through and once it gets through the first 50, your other thread zaps the iterators and kills the array, BOOM! This is dependent on timing. Just because it doesn't occur in one (or 200) runs, doesn't mean it can't happen in the next. That's why we have concurrency control. — Avi Berger, Feb 14 '23 at 07:01

user10 · Accepted Answer · 2023-02-14T13:17:23.233

You have race condition and you need to lock. Simple rule if thread can read variable with non atomic write from another you have race on that variable. Another problem you need to lock all vector. If you have lots of reads and rare writes std::shared_mutex might be good idea. If allow_list that is periodically updated only from the edges, list would be better option for allow_list = allow_list_updated since to swap list you need to swap head and tail. Another potential advantage of list is lack of false sharing. Whatever you do your container and its protection should be in one class.

score -1 · Answer 2 · answered Feb 14 '23 at 06:37

when you update a vector, all iterators become invalid. the reason for this is because the vector may reallocate the contents in memory. it may work sometimes, but eventually will segfault when you access an item that moved. the other reason is if you delete elements then your iterator could be pointing out of bounds or skip over entries. so you definitely need to perform some locking in both threads. which kind of lock depends on the rest of your code

i would also recommend either std::swap or std::move instead of allow_list = allow_list_updated; depending on if allow_list_updated can be discarded after the change; it's much faster. if you're updating this list frequently you'll probably want to use std::swap and keep the two lists in scope somewhere and just .clear() and std::swap() them each update. this will combat memory fragmentation. example:

class updater
{
public:
     std::vector<std::string> allowed;
     std::vector<std::string> allowed_updated;

     void update()
     {
          // @TODO: do your update to this->allowed_updated, use this->allowed_updated.reserve() if you know how many items there will be

          std::swap(this->allowed, this->allowed_updated);
          this->allowed_updated.clear();
     }
};

Thanks, didn't know about the swap function, but, is this thread safe? The allowed_updated can be discarded, I only care about the allow_list itself. — Baiqing, Feb 14 '23 at 06:51
In the internal implementation, swap(move internally) is used by both standard::vector and boost::vector. So operator= is actually std::swap for vectors. (old vector is garbage now) , something like: `_M_move_assign(std::move(__x), __bool_constant<__move_storage>());` — H.M, Feb 14 '23 at 07:03
One, std::swap() is not atomic. Two, it can sneak between the calls to begin() and end() in the other thread. Three, you are clearing the vector that the other thread is currently iterating. — n. m. could be an AI, Feb 14 '23 at 13:34

Iterating over vector in one thread while other may potentially change reference

2 Answers2