1

According to this, time complexity of search in a hashtable is O(1).

However if there is a collision, then obviously this should be O(1) + something.

My question is:

When you say

get(someKey) 

from a hashtable, the hashing function is applied to the someKey, and the data is directly retrieved from that location.

But imagine Seperate Chaining is used for collision resolution. And imagine someKey and someOtherKey have same output after our hashing function is applied on them. Say that it is the value "25".

So when I say

get(someKey)

I will get the data from location "25". Which makes it O(1). Great.

However when I say

get(someOtherKey) 

Now someOtherKey is linked to where someKey is.

When the hashing is applied on someOtherKey I get 25.

How do I get the value I reqiure? What are the internals? Is there some other table? How does the algorithm flow? Is there some other table for storing all the collisions?

Thank you. I hope my question is clear!

Koray Tugay
  • 22,894
  • 45
  • 188
  • 319
  • Here is a very good resource on this: http://www.youtube.com/watch?v=UPo-M8bzRrc&list=EC4BBB74C7D2A1049C&index=21 – Koray Tugay Mar 30 '13 at 18:43

1 Answers1

4

There are many different data structures possible for handling the collisions. Here's a good summary. http://en.wikipedia.org/wiki/Hash_table.

The hash function narrows the search down to a single bucket in the data structure. The bucket then contains another data structure for resolving collisions. It can be link to an array, where the keys are maintained in a sorted or unsorted order. The link might be to the first element in a linked list of keys, or to the root node of a b-tree. The important point is that the hashing function very quickly narrows down the scope of the search.

Once the scope is narrowed, some other less-efficient search can be useful to work through the collisions. It's all about the trade-offs. You want a hashing algorithm that gives a large enough range of hashes (and buckets) to minimize collisions, limited to how much memory you can afford. If the collisions are rare, a linear search through a linked list of collisions isn't bad. If there many collisions, then the efficiency of re-sizing arrays for the buckets becomes more important.

Mark Taylor
  • 1,843
  • 14
  • 17
  • Thanks. So a hashtable keeps buckets, not the values directly? – Koray Tugay Mar 30 '13 at 15:53
  • 1
    That's implementation dependent. Some designs may be structured to keep a certain number of values directly in a data-structure in the bucket, and only overflow to a secondary structure once the bucket is full. It especially makes sense to keep one value in the bucket, and then have a pointer to a linked-list for any secondary values. So it is in effect keeping the first node of the linked list in the bucket. – Mark Taylor Mar 30 '13 at 15:57
  • 1
    It also depends on the size of the values. A general case hashtable implemented in a library has to be flexible for many possible sizes and ranges of values. If the data set (number of values, range of values) is known in advance, it is possible to design a specific structure to be more efficient than the general case. It's just a lot of work that is only worth it for the edge cases. – Mark Taylor Mar 30 '13 at 16:06