unordered_set vs vector -- prefer idiomatic or performant?

Question

I'm working with data that is unique from other data of the same type. Very abstractly, a set fits the definition of the data I'm working with. I feel inclined to use std::unordered_set instead of std::vector for that reason.

Beyond that, both classes can fit my requirements. My question is about performance -- which might perform better? I cannot write out the code one way and benchmark it, then rewrite it the other way. That will take me hundreds of hours. If they'll perform similarly, do you think it would be worth-while to stick with the idiomatic unordered_set?

Here is a simpler use case. A company is selling computers. Each is unique from another in at least one way, guaranteed.

struct computer_t
{
    std::string serial;
    std::uint32_t gb_of_ram;
};
std::unordered_set<computer_t> all_computers_in_existence;
std::unordered_set<computer_t> computers_for_sale; // subset of above
// alternatively
std::vector<computer_t> all_computers_in_existence;
std::vector<computer_t> computers_for_sale; // subset of above

The company wants to stop selling computers that aren't popular and replace them with other computers that might be.

std::unordered_set<computer_t> computers_not_for_sale;
std::set_difference(all_computers_in_existence.begin(), all_computers_in_existence.end(),
                    computers_for_sale.begin(), computers_for_sale.end(),
                    std::inserter(computers_not_for_sale, computers_not_for_sale.end()));

calculate_and_remove_least_sold(computers_for_sale);
calculate_and_add_most_likely_to_sell(computers_for_sale, computers_not_for_sale);

Based on the above sample code, what should I choose? Or is there another, new STL feature (in C++17) I should investigate? This really is as generic as it gets for my use-case without making this post incredibly long with details.

Hundreds of hours? Really? So I assume You cannot write a wrapper class that provides the required operations and define a type alias allowing you to easily swap between both alternatives? — fabian, Feb 05 '22 at 08:28
If it is as hard to convert from one to the other as you say it is, then you are using operations other than "add a new element" and "report the number of elements". Presumably, one of these operations is "set difference". What are the other ones? Because it is only possible to answer your question knowing what operations need to have efficient implementations. — rici, Feb 05 '22 at 22:33

John Zwinck · Answer 1 · 2022-02-05T07:56:32.620

2

Idiomatic should be your first choice. If you implement it using unordered_set and the performance is not good enough, there are faster non-STL hash tables which are easy to switch to. 99% of the time it won't come to that.

Your example code using std::set_difference will not work, because that requires the inputs be sorted, which unordered_set is not. That's OK though, subtracting is done easily using unordered_set::erase(key).

edited Feb 05 '22 at 07:56

answered Feb 05 '22 at 07:54

John Zwinck

239,568
38
324
436

Thank you for your answer. Could comment on some faster, non-STL alternatives? – j__ Feb 05 '22 at 07:55
3

@j5w: A bunch of them are listed here: https://stackoverflow.com/questions/3300525/super-high-performance-c-c-hash-map-table-dictionary - but you're wasting your time thinking about this now. – John Zwinck Feb 05 '22 at 07:57

unordered_set vs vector -- prefer idiomatic or performant?

1 Answers1