1

Let's say I form a composite_key of three integer members for a boost::multi_index_container. The keys will span every combination of three integers within some range ({0, 0, 0}, {0, 0, 1}, {0, 0, 2}, etc). Internally, does boost store each of these combinations of integers as a key such that the total number of keys would be N x N x N, where N is the number of elements in the range, and where the hashed key may be 3x the number of bytes. Or, does it try to conserve memory by internally using, say, a tree of hash tables that would cut down on the total indices byte size?

I'm trying to figure out if creating the tree of hash tables myself would reduce the overall index byte size.

Sean
  • 380
  • 1
  • 11

2 Answers2

2

The size/overhead of a hashed index is 2+1/LF pointers per element, where LF is the index load factor. LF typically ranges between MLF/2 and MLF, where MLF is the maximum load factor allowed, by default 1, so the overhead ranges between 2+2=4 and 2+1=3 pointers per element, which is 3.5 pointers per element on average.

Note that this overhead is not related at all with the key (extractor) used for the index: neither composite_key nor any other key extractor provided by Boost.MultiIndex store/cache any kind of information about the key. In your scenario, the combined hash from the three integer data members is calculated on the fly every time it is needed.

Joaquín M López Muñoz
  • 5,243
  • 1
  • 15
  • 20
0

What index representation would result depends on the type of index. For hashed_unique/unordered_unique you can assume the underlying index is organized as a tree.

However, the main storage need not necessarily be optimal. All Muiti-Index containers are node-based, meaning that elements are not guaranteed to be contiguous in memory (but always have reference/iterator stability until deletion). This doesn't mean that all nodes must always get separately allocated: this could be optimized by the library and can also be influenced by using a custom allocator.

To cut a long story short, I'd probably consider

  • boost::flat_map<tuple<int, int, int> > or similar
  • boost multi-index container of struct R { int a,b,c; } (or indeed the same tuple) with a suitable pool allocator.
  • for comparison purposes you could actually use both the flat_map or a vector and use multi-index-container on top of it with T*, T& or reference_wrapper<T> instead of T. I'm not sure what the reduction in storage would be (likely 2 pointer sizes per element?) but at least it lets you measure the index size separately from the storage?

Flat map has advantages as it trades locality of reference for iterator/reference invalidation.

To measure your actual memory foot print use a heap profiler (e.g. valgrind --tool=massif).

sehe
  • 374,641
  • 47
  • 450
  • 633