C++ vector element erase versus new vector creation

Question

I'm working with a vector of elements that need to be selected at random and effectively removed until either either a condition is met, or until all the elements have been selected. However, they won't actually be removed until some later stage in code execution, so I need to maintain a list of valid, available elements. I can erase elements from this second vector, or I can recreate it each time. Please see a minimal version of my code below showing the example of where the vector is created each time in a while loop:

    Random mRandom; // Pseudo-random number generator
    std::vector< Element* > mElements;
    for( unsigned index = 0; index < ARBITRARY_VALUE; index++ )
      mElements.push_back( new Element( ) );

    std::vector< bool > removedElements;
    bool condition = true;

    while( condition == true ) {
      std::vector< unsigned > availableIndices;

      for( unsigned index = 0; index < mElements.size( ); index++ ) {
        if( removedElements[ index ] == false )
          availableIndices.push_back( index );
      }

      if( availableIndices.size( ) > 0 ) {
        unsigned maximum = availableIndices.size( ) - 1;
        unsigned randomIndex = mRandom.GetUniformInt( maximum ); // Zero to max
        removedElements[ availableIndices[ randomIndex ] ] = true;
        Element* element = mElements[ availableIndices[ randomIndex ] ];
        condition = element->DoStuff( ); // May change condition and exit while
      } else
        break;
    }

It's clear that erasing an element in the middle of a vector requires the underlying system to iterate through the remaining elements and 'move' them to their new, valid position. Obviously that means fewer iterations if the erased elements are near the end of a vector.

I've read a few posts regarding the costs associated with erasing vector elements, but I haven't seen anything that directly addresses my question. Does the process of 'moving' elements following an erasure introduce overheads that could make it cheaper to iterate through all of the elements every time by creating a new vector that points to the valid ones? As in my code example above.

Cheers, Phil

Looks like you want `std::stable_partition` to partition the elements that will eventually be removed. — PaulMcKenzie, Jun 03 '17 at 12:53
Is the order in `mElements` important? If not, then you can "remove" an element simply with `std::swap(mElements[randomIndex], mElements[--cur_size]);` (where `cur_size` is initialized with `mElements.size()` before the loop). In other words, move the "removed" elements to the end, ignore them in further processing. You can `erase` them all at once at the end, if desired. — Igor Tandetnik, Jun 03 '17 at 13:28

score 2 · Answer 1 · answered Jun 03 '17 at 14:05

I can't comment on the best way to solve your problem because I am not yet sure what the actual requirement of the function or algorithm is (i.e. should elements retain order? Will unavailable elements become available again? If they do, will ordering matter then? and so on)

However, regarding the last question:

Does the process of 'moving' elements following an erasure introduce overheads that could make it cheaper to iterate through all of the elements every time by creating a new vector that points to the valid ones?

This completely depends on what's involved in moving an element. If it's a pointer, as above, then you could move a lot of elements before even coming close to the cost of allocating memory in a new vector. And by 'a lot' I am thinking hundreds or even thousands.

In the code above, it looks as if the vector of availability is redundant. An Element pointer is available if it's in the vector of availableIndicies.

If I understand the intent correctly, I think I might refactor along the following lines:

#include <vector>
#include <random>

struct Element
{
  bool doStuff();
};


struct ElementAvailability
{
  ElementAvailability(std::vector<Element*> const& storage)
  : storage_(storage)
  {}

  void resync()
  {
    // will requre an allocation at most once if storage_ does not grow
    available_ = storage_;
  }

  std::size_t availableCount() const {
    return available_.size();
  }

  Element* removeAvailable(std::size_t index) {
    auto pe = available_[index];
    available_.erase(std::begin(available_) + index);
    return pe;
  }

  void makeUnavailable(std::size_t available_i)
  {
    available_.erase(std::next(std::begin(available_), available_i));
  }

private:
  std::vector<Element*> const& storage_;
  std::vector<Element*> available_;
};

// I have used a std random engine because I don't know your library
auto eng = std::default_random_engine(std::random_device()());

void test(std::vector<Element*>const& elems)
{
  auto available = ElementAvailability(elems);

  bool condition = true;
  auto getCount =[&condition, &available] () -> std::size_t 
  {
    if (condition) {
      available.resync();
      auto count = available.availableCount();
      return count;
    }
    else {
      return 0;
    }
  };

  while (auto count = getCount()) {
    auto range = std::uniform_int_distribution<std::size_t>(0, count - 1);
    auto index = range(eng);
    auto candidate = available.removeAvailable(index);
    condition = candidate->doStuff();
  }
}

Yuki · Answer 2 · 2017-06-06T13:13:47.550

The problem of random elimination of elements presented by you seems to me solvable only in O(n^2) time and O(n) space complexity. Because you have to pass all the elements once and on each pass you have to find a random index in a sequence of still existing elements and maintain this sequence. There might be few approaches with different algorithmic primitives. Below I present my solution that archives this goal while doing that in a CPU/memory operation friendly way.

void runRandomTasks() {
  Random mRandom; // Pseudo-random number generator
  std::vector<Element*> mElements;
  for (unsigned index = 0; index < ARBITRARY_VALUE; ++index) {
    mElements.push_back(new Element);
  }
  size_t current_size = mElements.size();
  if (!current_size)
    return;
  std::vector<Element*> current_elements(current_size, nullptr);
  for (unsigned index = 0; index < current_size; ++index) {
    current_elements[index] = mElements[index];
  }
  Element** last_ptr = &current_elements[0] + current_size - 1;

  bool condition = true;

  while (condition && current_size) {
    unsigned random_size = mRandom.GetUniformInt(current_size - 1) + 1; // Zero to max
    Element** ptr = last_ptr;
    while (true) {
      random_size -= (bool)(*ptr);
      if (random_size) {
        --ptr;
      } else {
        break;
      }
    }

    condition = (*ptr)->DoStuff(); // May change condition and exit while
    *ptr = nullptr;
    --current_size;
  }
}

Concerning your question of vector element erasure, you can find in my solution that there is a loop of finding a random index, that is equivalent to element erasure in a vector in time complexity but with a smaller constant since the element shifting is omitted, instead the element value is checked to be 0 or not. Doing any memory allocation in a loop is always costly, so avoid that.

C++ vector element erase versus new vector creation

2 Answers2