19

I've got the following problem. I have a game which runs on average 60 frames per second. Each frame I need to store values in a container and there must be no duplicates.

It probably has to store less than 100 items per frame, but the number of insert-calls will be alot more (and many rejected due to it has to be unique). Only at the end of the frame do I need to traverse the container. So about 60 iterations of the container per frame, but alot more insertions.

Keep in mind the items to store are simple integer.

There are a bunch of containers I can use for this but I cannot make up my mind what to pick. Performance is the key issue for this.

Some pros/cons that I've gathered:


vector

  • (PRO): Contigous memory, a huge factor.
  • (PRO): Memory can be reserved first, very few allocations/deallocations afterwards
  • (CON): No alternative than to traverse the container (std::find) each insert() to find unique keys? The comparison is simple though (integers) and the whole container can probably fit the cache

set

  • (PRO): Simple, clearly meant for this
  • (CON): Not constant insert-time
  • (CON): Alot of allocations/deallocations per frame
  • (CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.

unordered_set

  • (PRO): Simple, clearly meant for this
  • (PRO): Average case constant time insert
  • (CON): Seeing as I store integers, hash operation is probably alot more expensive than anything else
  • (CON): Alot of allocations/deallocations per frame
  • (CON): Not contigous memory. Traversing a set of hundreds of objects means jumping around alot in memory.

I'm leaning on going the vector-route because of memory access patterns, even though set is clearly meant for this issue. The big issue that is unclear to me is whether traversing the vector for each insert is more costly than the allocations/deallocations (especially considering how often this must be done) and the memory lookups of set.

I know ultimately it all comes down to profiling each case, but if nothing else than as a headstart or just theoretically, what would probably be best in this scenario? Are there any pros/cons I might've missed aswell?

EDIT: As I didnt mention, the container is cleared() at the end of each frame

KaiserJohaan
  • 9,028
  • 20
  • 112
  • 199
  • 4
    ***Just measure it.*** Given that `unordered_set` is **the** classic "set" container, with unordered-no-duplicate semantics and best asymptotic complexity, I'd give it a shot, but chances are `vector` will beat it for small container sizes, since it has far better cache locality properties. – The Paramagnetic Croissant Feb 27 '15 at 14:23
  • What about providing your own allocator, that is able to overcome the inefficiencies in memory management? (e.g. providing an object pool) – πάντα ῥεῖ Feb 27 '15 at 14:23
  • 1
    Whatever you do, try to properly encapsulate your code and use `auto` to track types so you can easily change your choice of container in the future. Then measure. – Chris Drew Feb 27 '15 at 14:24
  • If you know what range of values are going to be stored in the vector you could put them in there own vector and then use [random_suffle](http://www.cplusplus.com/reference/algorithm/random_shuffle/) on the container and just take the first 100 elements from it. – NathanOliver Feb 27 '15 at 15:03
  • 1
    A `set` will probably be dominated by branch mispredictions. The complete data fits into L1 and allocations can be avoided (block allocator). However, `set` is a tree structure, so several branches with a ~50% chance of misprediction per insert are unavoidable. – Damon Feb 27 '15 at 15:11
  • What's the range of your values? Consider using a `bitset` if the range is limited to a few thousand. – Sebastian Redl Feb 27 '15 at 16:13
  • store data in vector first & then use another container to store pointers to the first container. This means having to iterate through vector container to store pointers, but only need to do this once each frame, then traversing container of pointers should be cheaper depending on container used. – ReturnVoid Oct 23 '21 at 02:59

3 Answers3

13

I did timing with a few different methods that I thought were likely candidates. Using std::unordered_set was the winner.

Here are my results:

Using UnorderedSet:    0.078s
Using UnsortedVector:  0.193s
Using OrderedSet:      0.278s
Using SortedVector:    0.282s

Timing is based on the median of five runs for each case.

compiler: gcc version 4.9.1
flags:    -std=c++11 -O2
OS:       ubuntu 4.9.1
CPU:      Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz

Code:

#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <iostream>
#include <random>
#include <set>
#include <unordered_set>
#include <vector>

using std::cerr;
static const size_t n_distinct = 100;

template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
  auto distribution = std::uniform_int_distribution<int>(0,n_distinct);
  auto generator = [&]{return distribution(engine);};
  auto vec = std::vector<int>();
  std::generate_n(std::back_inserter(vec),n,generator);
  return vec;
}


struct UnsortedVectorSmallSet {
  std::vector<int> values;
  static const char *name() { return "UnsortedVector"; }
  UnsortedVectorSmallSet() { values.reserve(n_distinct); }

  void insert(int new_value)
  {
    auto iter = std::find(values.begin(),values.end(),new_value);
    if (iter!=values.end()) return;
    values.push_back(new_value);
  }
};


struct SortedVectorSmallSet {
  std::vector<int> values;
  static const char *name() { return "SortedVector"; }
  SortedVectorSmallSet() { values.reserve(n_distinct); }

  void insert(int new_value)
  {
    auto iter = std::lower_bound(values.begin(),values.end(),new_value);
    if (iter==values.end()) {
      values.push_back(new_value);
      return;
    }
    if (*iter==new_value) return;
    values.insert(iter,new_value);
  }
};

struct OrderedSetSmallSet {
  std::set<int> values;
  static const char *name() { return "OrderedSet"; }
  void insert(int new_value) { values.insert(new_value); }
};

struct UnorderedSetSmallSet {
  std::unordered_set<int> values;
  static const char *name() { return "UnorderedSet"; }
  void insert(int new_value) { values.insert(new_value); }
};



int main()
{
  //using SmallSet = UnsortedVectorSmallSet;
  //using SmallSet = SortedVectorSmallSet;
  //using SmallSet = OrderedSetSmallSet;
  using SmallSet = UnorderedSetSmallSet;

  auto engine = std::default_random_engine();

  std::vector<int> values_to_insert = randomInts(engine,10000000);
  SmallSet small_set;
  namespace chrono = std::chrono;
  using chrono::system_clock;
  auto start_time = system_clock::now();
  for (auto value : values_to_insert) {
    small_set.insert(value);
  }
  auto end_time = system_clock::now();
  auto& result = small_set.values;

  auto sum = std::accumulate(result.begin(),result.end(),0u);
  auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();

  cerr << "Using " << SmallSet::name() << ":\n";
  cerr << "  sum=" << sum << "\n";
  cerr << "  elapsed: " << elapsed_seconds << "s\n";
}
Vaughn Cato
  • 63,448
  • 5
  • 82
  • 132
8

I'm going to put my neck on the block here and suggest that the vector route is probably most efficient when the size is 100 and the objects being stored are integral values. The simple reason for this is that set and unordered_set allocate memory for each insert whereas the vector needn't more than once.

You can increase search performance dramatically by keeping the vector ordered, since then all searches can be binary searches and therefore complete in log2N time.

The downside is that the inserts will take a tiny fraction longer due to the memory moves, but it sounds as if there will be many more searches than inserts, and moving (average) 50 contiguous memory words is an almost instantaneous operation.

Final word: Write the correct logic now. Worry about performance when the users are complaining.

EDIT: Because I couldn't help myself, here's a reasonably complete implementation:

template<typename T>
struct vector_set
{
    using vec_type = std::vector<T>;
    using const_iterator = typename vec_type::const_iterator;
    using iterator = typename vec_type::iterator;

    vector_set(size_t max_size)
    : _max_size { max_size }
    {
        _v.reserve(_max_size);
    }

    /// @returns: pair of iterator, bool
    /// If the value has been inserted, the bool will be true
    /// the iterator will point to the value, or end if it wasn't
    /// inserted due to space exhaustion
    auto insert(const T& elem)
    -> std::pair<iterator, bool>
    {
        if (_v.size() < _max_size) {
            auto it = std::lower_bound(_v.begin(), _v.end(), elem);
            if (_v.end() == it || *it != elem) {
                return make_pair(_v.insert(it, elem), true);
            }
            return make_pair(it, false);
        }
        else {
            return make_pair(_v.end(), false);
        }
    }

    auto find(const T& elem) const
    -> const_iterator
    {
        auto vend = _v.end();
        auto it = std::lower_bound(_v.begin(), vend, elem);
        if (it != vend && *it != elem)
            it = vend;
        return it;
    }

    bool contains(const T& elem) const {
        return find(elem) != _v.end();
    }

    const_iterator begin() const {
        return _v.begin();
    }

    const_iterator end() const {
        return _v.end();
    }


private:
    vec_type _v;
    size_t _max_size;
};

using namespace std;


BOOST_AUTO_TEST_CASE(play_unique_vector)
{
    vector_set<int> v(100);

    for (size_t i = 0 ; i < 1000000 ; ++i) {
        v.insert(int(random() % 200));
    }

    cout << "unique integers:" << endl;
    copy(begin(v), end(v), ostream_iterator<int>(cout, ","));
    cout << endl;

    cout << "contains 100: " << v.contains(100) << endl;
    cout << "contains 101: " << v.contains(101) << endl;
    cout << "contains 102: " << v.contains(102) << endl;
    cout << "contains 103: " << v.contains(103) << endl;
}
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 3
    For what it's worth, if you're using Boost already, this sort of container is available in the [Boost.Container](http://www.boost.org/doc/libs/1_57_0/doc/html/container.html) library as `flat_set`. It also has a `flat_map`. – Jason R Feb 27 '15 at 15:19
  • 1
    Good point! I'm surprised it didn't occur to me to look in boost first. Since c++14 I seem to have forgotten how to use boost... – Richard Hodges Feb 27 '15 at 15:25
2

As you said you have many insertions and only one traversal, I’d suggest to use a vector and push the elements in regardless of whether they are unique in the vector. This is done in O(1).

Just when you need to go through the vector, then sort it and remove the duplicate elements. I believe this can be done in O(n) as they are bounded integers.

EDIT: Sorting in linear time through counting sort presented in this video. If not feasible, then you are back to O(n lg(n)).

You will have very little cache miss because of the contiguity of the vector in memory, and very few allocations (especially if you reserve enough memory in the vector).

qdii
  • 12,505
  • 10
  • 59
  • 116