4

I am developing a time critical application and am looking for the best container to handle a collection of elements of the following type:

class Element
{
    int  weight;
    Data data;
};

Considering that the time critical steps of my application, periodically performed in a unique thread, are the following:

  • the Element with the lowest weight is extracted from the container, and data is processed;
  • a number n>=0 of new Element, with random(*) weight, are inserted into the container.

Some Element of the container may have the same weight. The total number of elements in the container at any time is quite high and almost stationary in average (several hundreds of thousands). The time needed for the extract/process/insert sequence described above must be as low as possible. (Note(*): new weight is actually computed from data but is considered as random here to simplify.)

After some searches and tries of different STL containers, I ended up using std::multiset container, which performed about 5 times faster than ordered std::vector and 16 times faster than ordered std:list. But still, I am wondering if I could achieve even better performance, considering that the bottleneck of my application remains in the extract/insert operations.

Notice that, though I only tried ordered containers, I did not mentioned "ordered container" in my requirements. I do not need the Element to be ordered in the container, I only need to perform the "extract lowest weighted element"/"insert new elements" operations as fast as possible. I am not limited to STL containers and can go for boost, or any other implementation, if suited.

Thanks for help.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
shrike
  • 4,449
  • 2
  • 22
  • 38
  • 1
    Can you elaborate on the "time critical" parts? Is it only the collection of the structures that needs to happen as fast as possible? Storing them in a container? The processing of them? A mix? Something else? – Some programmer dude Mar 25 '16 at 17:42
  • @Joachim: Application is time critical in the sense that the whole extract/process/insert sequence has to be executed in a finite and as short as possible time. – shrike Mar 27 '16 at 14:19

4 Answers4

3

I do not need the Element to be ordered in the container, I only need to perform the "extract lowest weighted element"/"insert new elements" operations as fast as possible.

Then you should try priority_queue<T>, or use make_heap/push_heap/pop_heap operations on a vector<T>.

Since you are looking for min heap, not max heap, you would need to supply a custom comparator that orders your Element objects in reverse order.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    I tried using std::priority_queue, following your 1st proposition. Using this container, execution time of the extract/insert operations is twice as fast as the std::multiset solution. This is great ! I gave a try to your 2nd proposition too (vector/make_heap), and I got a slightly lesser performance than priority_queue (and code was less straightforward to write). Preference for the 1st proposition then. Thank you very much for your answer and the excellent solution to my problem. – shrike Mar 27 '16 at 14:20
1

I think that within the STL , lazy std::vector will give the best results.

a suggested psuedo code may look like:

  • emplace back new elements in the end of the vector
  • only when you want to smallest element, sort the array and get the first element

in this way, you get the amortized insertion time of vector, relativly small amount of memory allocations and good cache locality.

David Haim
  • 25,446
  • 3
  • 44
  • 78
  • so what is the bottleneck of your application? – David Haim Mar 27 '16 at 14:42
  • I am confused but I made a mistake during my tests: I thought testing your solution while I was actually testing the 2nd solution of dasblinkenlight (#define mistake). Using your lazy vector solution, I actually got dramaticaly bad performance. Supprisingly, the performance loss is neither in the extract nor in the insert operation (sort seems to run quite quickly) but in the process operation itself which takes 10 times as long as usually. Such process performs only computation and does not perform any operation on the container (neither extract nor insert). – shrike Mar 27 '16 at 16:16
  • if you'll past everything you have tried, we will manage to help you more – David Haim Mar 27 '16 at 16:22
1

It is instructive to consider different candidates and how your assumptions would impact the final selection. When your requirements change, it then becomes easer to switch containers.

Generally, the containers of size N have roughly 3 complexity categories for their basic acces/modification operations: (amortized) O(1), O(log N) and O(N).

Your first requirement (finding the lowest weight element) gives you roughly three candidates with O(1) complexity, and one candidate with O(N) complexity per element:

  1. O(1) for std::priority_queue<Element, LowestWeightCompare>

  2. O(1) for std::multiset<Element, LowestWeightCompare>

  3. O(1) for boost::flat_multiset<Element, LowestWeightCompare>

  4. O(N) for std::unordered_multiset<Element>

Your second requirement (randomized insertion of new elements) gives you the following complexity per element for each of the above four choices

  1. O(log N) for std::priority_queue

  2. O(log N) for std::multiset

  3. O(N) for boost::flat_multiset

  4. amortized O(1) for std::unordered_multiset

Among the first three choices, boost::multiset should be dominated by the other two for large N. Among the remaining two, the better caching behavior of std::priority_queue over std::multiset might prevail. But: measure, measure, measure, however.

It is a priori ambiguous whether std::unorderd_multiset is competitive with the other three. Depending on the number n of randomly inserted elements, total cost per batch of find(1)-insert(n) would be O(N) search + O(n) insertion for std::unordered_multiset and O(1) search + O(n log N) insertion for std::multiset. Again, measure, measure, measure.

How robust are these considerations with respect to your requirements? The story would change as follows if you would have to find the k lowest weight elements in each batch. Then you'd have to compare the costs of find(k)-insert(n). The search costs would scale roughly as

  1. O(k log N) for std::priority_queue
  2. O(1) for std::multiset
  3. O(1) for boost::flat_multiset
  4. O(k N) for std::unordered_multiset

Note that a priority_queue can only efficiently access the top element, not its k top elements without actually calling pop() on them, which has O(log N) complexity per call. If you expect that your code would likely change from a find(1)-insert(n) batch-mode to a find(k)-insert(n), then it might be a good idea to choose std::multiset, or at least document what kind of interface changes it would require.

Bonus: the best of both worlds?! You might also want to experiment a bit with Boost.MultiIndex and use something like (check the documentation to get the syntax correct)

boost::multi_index<
    Element, 
    indexed_by<
        ordered_non_unique<member<Element, &Element::weight>, std::less<>>,
        hashed_non_unique<>
    >
>

The above code will create a node-based container that implement two pointer structures to keep track of both the ordering by Element weight and also to allow quick hashed insertion. This will allow O(1) lookup of the lowest weight Element and also allows O(n) random insertion of n new elements.

For large N, it should scale better than the four previously mentioned containers, but again, for moderate N, cache effects induced by pointer chasing into random memory might spoil its theoretical advantage over std::priority_queue. Did I mention the mantra of measure, measure, measure?

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • 1
    Thank you very much for this analysis of my problem and the detailed solutions you proposed; I will give a try to `multi_index` as `N` is actually quite large (for now, `N`~200k elements almost stationary and I'd like to be able to manage even more). Just a comment about my **1st requirement** : it's not only to **find** the lowest weight element but also to **extract** it from the container, so `O(log N)`, if I am not mistaken, for `priority_queue` and `multi_set` (for others, I don't know). Again, thanks and, for sure, I included accurate time **measurement** into my code. – shrike Mar 31 '16 at 11:40
0

Try either of these:

std::map<int,std::vector<Data>>

or

std::unordered_map<int,std::vector<Data>>

The int above is the weight.

These both have different speeds for find, remove and add depending on many different factors such as if the element is there or not. (If there, unordered_map .find is faster, if not, map .find is faster)

Cmoraski
  • 78
  • 4
  • As far as I know, implementation of std::map generaly uses binary tree, as same as std::multiset. Are you sure I can expect a significant performance increase using it ? And for std:unordered_map, I suppose this could be a solution if the weights was known prior to extraction, which is not the case. – shrike Mar 27 '16 at 14:21
  • how are you going to find the element with the lowest weight? – TemplateRex Mar 29 '16 at 10:43