0

I am trying to do a performance benchmark on C++11's std::unordered_map container.

I want to see how the load factor of the container affects performance for insertions. Specifically because I am interested in using a hash table as base data structure for finding pairs in a huge set of numbers.

As I understand the documentation, this does not seem possible. I can set the amount of buckets with a rehash() but this is done automatically any time the max_load_factor is exceeded.

I can set the max_load_factor but as I understand it, this only determines when a rehash is performed, it does not allow the table to be placed under heavy strain, which is what I want to do.

Is there any way for me to hard limit the amount of buckets in a hash table?

getack
  • 172
  • 7

2 Answers2

3

Set the max_load_factor to INFINITY. That way the container should never be tempted to do an automatic rehash to keep the load_factor below max_load_factor.

Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
0

Not sure if this is a good answer but it's an explanation why it might not be possible.

If you have open addressing you need to resize, but this is implementational detail. You can have an implementation using chaining collision resolution and place restriction on chain length, and resize when it's violated. Many things can happen under the hood.

What I mean is, from the user perspective, you are not guaranteed that you can safely fix number of buckets, because some implementations might blow up. Even if you allow load factor to be high, occasionally upon addition table has to resize, because target bucket got full, otherwise it would not be able to accept the element. That might happen even for relatively low load factors.

Of course some implementations might handle arbitrarily large load factors, but this is not a general property.

Bottom line is, that fixing number of buckets does not make much sense in general. You would be able only to experiment up to the point of resize anyways, which might be needed for different load factors, depending on key distribution. Basically you cannot test arbitrarily heavy load for every implementation.

luk32
  • 15,812
  • 38
  • 62
  • Your logic is sound from a Comp Sci perspective addressing hash tables in general, but `std::unordered_map` specifically has a default max load factor of 1.0, which isn't tenable unless there's chaining per bucket. – Tony Delroy Apr 08 '15 at 03:17
  • I didn't find the specific value IMO it's implementation defined. Can you quote the source for `1.0` value? It is obvious that this is max for some implementations, but standard does not define which one should be used. – luk32 Apr 08 '15 at 06:08
  • 23.5.4.2/1 and /3 say of the `std::unordered_map` constructors' effects (post-conditions): "`max_load_factor()` return 1.0." That includes the constructor that populates the container based on iterator arguments, so isn't just some default for empty containers that might be changed as soon as an element's added. Same deal for ...`multi`... and ...`set` containers, btw. – Tony Delroy Apr 08 '15 at 06:25
  • Oh, yeah, you are absolutely right. I missed "*by default*". But my point was that you might set it higher, however it doesn't mean that you will ever achieve it, precisely for the reason you stated. Only certain implementations could work do it. On the other hand, for some implementation even `1.0` is not really possible, unless you will have perfect key distribution. What I tried to say, this is the only parameter you can tune, and it's not very reliable. The practical consequences are highly tied to the concrete implementation. – luk32 Apr 08 '15 at 06:35
  • *"it's not very reliable"* - it *is* reliable precisely because it constrains the implementations to use open hashing. *"for some implementation even 1.0 is not really possible, unless you will have perfect key distribution"* - and performance as load factor approaches 1.0 would drop below the O(1) average case the Standard requires of e.g. `[]`, `at`, `insert`. So, we can drop all reasoning about `unordered_map` performance analysis that's based on the possibility of open addressing / closed hashing - such implementations can't be Standards conformant. – Tony Delroy Apr 08 '15 at 06:56
  • Your comment is very interesting. Though, I don't agree with it. You can have open addressing hashing with O(1) average complexity. Even if at load factor `1.0` it drops. You cannot say load factor `1.0` is average case. `max_load_factor` does not describe what is achievable, it's an upper bound at which resizing is guaranteed. An implementation might chose to resize on it's own, below this value. Maybe this is worth asking a question. What it the interpretation of standard, and whether open hashing can or cannot be conformant. IMO, they can. – luk32 Apr 08 '15 at 07:50
  • We'll have to agree to disagree. Just for general background, you might also want to look at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html - specifically "B. Chaining Versus Open Addressing". Cheers. – Tony Delroy Apr 08 '15 at 08:15
  • I know discussions and thanks are not very welcomed here but still. Thank you for your comments and opinions, very valuable and interesting. "*We'll have to agree to disagree.*" - Sure thing =). Cheers. – luk32 Apr 08 '15 at 08:21
  • Appreciate your comment, and the discussion :-). Also think I've found something that even more clearly shows the need for `max_load_factor` to be honoured precisely: 23.2.5/15 *"The insert and emplace members shall not affect the validity of iterators if (N+n) < z * B, where N is the number of elements in the container prior to the insert operation, n is the number of elements inserted, B is the container’s bucket count, and z is the container’s maximum load factor."*. – Tony Delroy Apr 08 '15 at 08:44