1

Use case: I am converting data from a very old program of mine to a database friendly format. There are parts where I have to do multiple passes over the old data, because in particular the keys have to first exist before I can reference them in relationships. So I thought why not put the incomplete parts in a vector of references during the first pass and return it from the working function, so I can easily use that vector to make the second pass over whatever is still incomplete. I like to avoid pointers when possible so I looked into std::reference_wrapper<T> which seemes like exactly what I need .. except I don't understand it's behavior at all.

I have both vector<OldData> old_data and vector<NewData> new_data as member of my conversion class. The converting member function essentially does:

//...
vector<reference_wrapper<NewData>> incomplete;
for(const auto& old_elem : old_data) {
    auto& new_ref = *new_data.insert(new_data.end(), convert(old_elem));
    if(is_incomplete(new_ref)) incomplete.push_back(ref(new_ref));
}
return incomplete;

However, incomplete is already broken immediately after the for loop. The program compiles, but crashes and produces gibberish. Now I don't know if I placed ref correctly, but this is only one of many tries where I tried to put it somewhere else, use push_back or emplace_back instead, etc. .. Something seems to be going out of scope, but what? both new_data and old_data are class members, incomplete also lives outside the loop, and according to the documentation, reference_wrapper is copyable.

Here's a simplified MWE that compiles, crashes, and produces gibberish:

// includes ..
using namespace std;
int main() {
    int N = 2; // works correctly for N = 1 without any other changes ... ???
    vector<string> strs;
    vector<reference_wrapper<string>> refs;
    for(int i = 0; i < N; ++i) {
        string& sref = ref(strs.emplace_back("a"));
        refs.push_back(sref);
    }
    for (const auto& r : refs) cout << r.get(); // crash & gibberish
}

This is g++ 10.2.0 with -std=c++17 if it means anything. Now I will probably just use pointers and be done, but I would like to understand what is going on here, documentation / search does not seem to help..

Shiwayari
  • 315
  • 3
  • 12
  • `*new.insert` ? `new` is a keyword so this can't be real code. Why would you insert at the end instead of `push_back`? – Ryan Haining Aug 07 '20 at 17:39
  • Godbolt now has sanitizers https://godbolt.org/z/nG4snz – NoSenseEtAl Aug 07 '20 at 17:39
  • and your problem is vector iterator invalidation. https://godbolt.org/z/nzoMKK – NoSenseEtAl Aug 07 '20 at 17:42
  • @RyanHaining oh, of course, sorry I typed this snipped in here without checking correct variable names. Will edit. In that try I use insert to get the reference back that I need to store somewhere. I also tried just push_back and then getting the inserted element with .back(), but it was pretty much the same thing happening – Shiwayari Aug 07 '20 at 17:42
  • Your simplified code is going to exhibit a different problem, which is why N=1 works. Whenever you add to a vector, it may have to grow. If it grows, it is actually allocating a new block of memory and moving all the elements from the old block to the new block. In your example, when you insert the second element the vector is growing to a capacity of 2, and the reference/pointer to the first element is now invalid. – Ryan Haining Aug 07 '20 at 17:43
  • If you're modifying the vector's size in the real code, then it's the same problem, but otherwise you should initialize `strs` before adding the references to `refs` – Ryan Haining Aug 07 '20 at 17:44
  • I see .. that makes sense. I am indeed most likely modifying the size of `new_data` in the real code since it is a large dataset. I guess pointers might have the same problem though I will have play around to understand it completely. I guess I will have to do an extra pass just to find the incompletes then, or somehow obtain the size beforehand, which is probably just as clunky anyway. Thanks for the pointers! – Shiwayari Aug 07 '20 at 17:54
  • This `string& sref = ref(strs.emplace_back("a"));` should not be compilable since `emplace_back` returns nothing. – muaz Aug 07 '20 at 18:24
  • @muaz this does compile as-is (well, minus includes) - `emplace_back` returns the `reference` member as of c++17: https://en.cppreference.com/w/cpp/container/vector/emplace_back – Shiwayari Aug 07 '20 at 18:37

1 Answers1

2

The problem here is that you are using vector data structure which might re-allocate memory for the entire vector any time that you add an element, so all previous references on that vector most probably get invalidated, you can resolve your problem by using list instead of vector.

muaz
  • 550
  • 9
  • 15