43

I am confused about the time complexity of hash table many articles state that they are "amortized O(1)" not true order O(1) what does this mean in real applications. What is the average time complexity of the operations in a hash table, in actual implementation not in theory, and why are the operations not true O(1)?

Daenyth
  • 35,856
  • 13
  • 85
  • 124
marme
  • 486
  • 1
  • 4
  • 7
  • This is related, although not exactly the same question: http://stackoverflow.com/questions/2369467/why-are-hash-table-expansions-usually-done-by-doubling-the-size – Pascal Cuoq Oct 16 '10 at 14:34
  • This helps to answer insertion but does not explain anything about the other operations, I am most interested in an explanation as to the time complexity of lookup in a hash table – marme Oct 16 '10 at 14:45
  • Under some hypotheses on the hash function, lookup is real O(1) time for most hash-table implementations. Indeed, in some implementations with bounded bucket depth, it is constant by design. – Pascal Cuoq Oct 16 '10 at 14:52

3 Answers3

23

It's impossible to know in advance how many collisions you will get with your hash function, as well as things like needing to resize. This can add an element of unpredictability to the performance of a hash table, making it not true O(1). However, virtually all hash table implementations offer O(1) on the vast, vast, vast majority of inserts. This is the same as array inserting - it's O(1) unless you need to resize, in which case it's O(n), plus the collision uncertainty.

In reality, hash collisions are very rare and the only condition in which you'd need to worry about these details is when your specific code has a very tight time window in which it must run. For virtually every use case, hash tables are O(1). More impressive than O(1) insertion is O(1) lookup.

Puppy
  • 144,682
  • 38
  • 256
  • 465
9

For some uses of hash tables, it's impossible to create them of the "right" size in advance, because it is not known how many elements will need to be held simultaneously during the lifetime of the table. If you want to keep fast access, you need to resize the table from time to time as the number of element grows. This resizing takes linear time with respect to the number of elements already in the table, and is usually done on an insertion, when the number elements passes a threshold.

These resizing operations can be made seldom enough that the amortized cost of insertion is still constant (by following a geometric progression for the size of the table, for instance doubling the size each time it is resized). But one insertion from time to time takes O(n) time because it triggers a resize.

In practice, this is not a problem unless you are building hard real-time applications.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • It's not only the size that's the consideration - it's also the hash collisions. There are different ways of dealing with them, but whatever you do it won't happen in O(1) time. The average case is still close to O(1) in practice though unless the hash table gets quite full – Jords Oct 16 '10 at 14:58
  • 3
    @Jords I do not know what "close to O(1)" means. Besides, I am pretty confident that the "amortized O(1)" found in the literature corresponds to hypotheses on the hash function where the bucket depth remains below a fixed bound, hence constant time. Because if the lookup without resizing was not constant time, the amortized lookup would certainly not be constant time either. – Pascal Cuoq Oct 16 '10 at 15:10
3

Inserting a value into a Hash table takes, on the average case, O(1) time. The hash function is computed, the bucked is chosen from the hash table, and then item is inserted. In the worst case scenario, all of the elements will have hashed to the same value, which means either the entire bucket list must be traversed or, in the case of open addressing, the entire table must be probed until an empty spot is found. Therefore, in the worst case, insertion takes O(n) time

refer: http://www.cs.unc.edu/~plaisted/comp550/Neyer%20paper.pdf (Hash Table Section)

Rohit Jain
  • 127
  • 3
  • 9