8

I am wondering what would be the time complexity on Java HashMap resizing when the load factor exceeds the threshold ? As far as I understand for HashMap the table size is always power of 2 an even number, so whenever we resize the table we don't necessary need to rehash all the keys (correct me if i am wrong), all we need to do is to allocate additional spaces without and copy over all the entries from the old table (I am not quite sure how does JVM deal with that internally), correct ? Whereas for Hashtable since it uses a prime number as the table size, so we need to rehash all the entries whenever we re-size the table. So my question is does it still take O(n) linear time for resizing on HashMap ?

peter
  • 8,333
  • 17
  • 71
  • 94
  • You could always just study the [source for HashMap](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashMap.java). :) – Ted Hopp Jan 10 '13 at 05:25

2 Answers2

9

Does it still take O(N) time for resizing a HashMap?

Basically, yes.

And a consequence is that an insertion operation that causes a resize will take O(N) time. But that happens on O(1/N) of all insertions, so (under certain assumptions) the average insertion time is O(1).

so could a good load factor affect this performance ? like better and faster than O(N)?

Choice of load factor affects performance, but not complexity.

If we make normal assumptions about the hash function and key clustering, when the load factor is larger:

  • the average hash chain length is longer, but still O(1),
  • frequency of resizes reduces, but is still O(1/N),
  • the cost of a resize remains about the same, and the complexity is still O(N).

... so whenever we resize the table we don't necessary need to rehash all the keys (correct me if i am wrong.

Actually, you would need to rehash all of the keys. When you double the hash table size, the hash chains need to be split. To do this, you need to test which of two chains the hash value for every key maps to. (Indeed, you need to do the same if the hash table had an open organization too.)

However, in the current generation of HashMap implementations, the hashcode values are cached in the chained entry objects, so that the hashcode for a key doesn't ever need to be recomputed.


One comment mentioned the degenerate case where all keys hash to the same hashcode. That can happen either due to a poorly designed hash function, or a skewed distribution of keys.

This affects performance of lookup, insertion and other operations, but it does not affect either the cost or frequency of resizes.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • So does it mean the insertion takes O(n) in worst case ? – peter Jan 10 '13 at 05:33
  • What insertion? We were talking about resizing weren't we? – Stephen C Jan 10 '13 at 05:35
  • 1
    Insertion would be O(n) only in a degenerate case where ALL the keys hash to the same value. – Jim Garrison Jan 10 '13 at 05:36
  • I mean resizing occurs (when exceeds threshold) right after insertion isn't it ? – peter Jan 10 '13 at 05:37
  • 1
    @user1389813 - In that case, yes. The average cost of HashMap.insert() is `O(1)` but the worst case is `O(N)`. But this isn't that strange. The same thing happens with `StringBuffer.append`, appending to an `ArrayList` and so on. – Stephen C Jan 10 '13 at 05:39
  • @JimGarrison why is that ? – peter Jan 10 '13 at 05:40
  • @user1389813 - because if all of the keys hash to the same hashcode, they all end up on the same chain, and the insertion would need to check all of the entries. – Stephen C Jan 10 '13 at 05:42
  • @user1389813 - but Jim is wrong about this being the ONLY case ... unless he is talking about an **average** cost of `O(N)` for insertions. Bottom line is that the complexity analysis of hash tables is notoriously tricky. – Stephen C Jan 10 '13 at 05:45
  • @StephenC so could a good load factor affect this performance ? like better and faster than O(N) ? – peter Jan 10 '13 at 05:45
  • The load factor is an empirical compromise between the space used to represent the hash table and the average length of the hash chains. Even if you change it, the cost of a resize will still be O(N). The difference will be in the values of the constants of proportionality; e.g. in the lookup cost, the resize cost and the space usage. Generally speaking the default load factor is a good one. – Stephen C Jan 10 '13 at 05:51
  • @StephenC Can you remind me of the other cases where insertions could be O(n), assuming a hash table with separate chaining? – Alex DiCarlo Jan 10 '13 at 07:04
  • @dicarlo2 - when a large proportion of the keys' hashcodes are the same. As identified by Jim's comment. – Stephen C Jan 10 '13 at 09:15
  • @AlexDiCarlo - also, if you do an insertion and that insertion triggers a resize, then that insertion operation will be `O(N)`. The `O(1)` characterization for insertion, is an *average* over the lifetime of the hash table. – Stephen C Dec 14 '13 at 23:28
0

When the table is resized, the entire contents of the original table must be copied to the new table, so it takes O(n) time to resize the table, where n is the number of elements in the original table. The amortized cost of any operation on a HashMap (assuming the uniform hashing assumption) is O(1), but yes, the worst case cost of a single insertion operation is O(n).

Alex DiCarlo
  • 4,851
  • 18
  • 34