2

Lets say there is a Hash function. It stores 'n' key-values pair. If i need a value of particular key, is hash function traversing all keys to find the key whose value we are looking for. If yes then how come complexity is O(1)? how do hash looks for a keys?

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
user1322495
  • 79
  • 1
  • 4

3 Answers3

3

No, that's not how a hash table works. A hash table is essentially an array that stores values at indexes corresponding to their keys' hashes. So, let's say I want to map a string "abc" to another string "xyz", and assume "abc" hashes to 42. What I would do is go to index 42 of my table and place the string "xyz" there. Now, if later I want to find the value associated with the string "abc", I would hash it again and go to the corresponding index (42), finding "xyz" there. This is overall an O(1) operation. In summary:

Mapping "abc" to "xyz"...

1. hash("abc") = 42

2. Place in "xyz" table:

 ---+-----------------------------------+---
... |      |      | "xyz" |      |      | ...
 ---+-----------------------------------+---
       40     41     42     43     44

Later...

1. Query "abc"

2. hash("abc") = 42

3. Look at index 42, find the value "xyz"

I've oversimplified slightly just to portray the gist of how a hash table works, and I urge you to go through the hash table Wikipedia article for a more in depth description. Note also that many times you see hash tables implemented as an array of linked lists, so as to account for cases where two keys hash to the same number (so-called hash collisions). Using a plain array would not be able to handle such cases since we would not be able to store multiple values at the same location. This is, for example, how Java implements HashMap.

For instance, take the example above and assume we also want to map "123" to "pqr", and assume that "123" also hashed to 42. The final result would look something like this:

       40     41     42     43     44
 ---+-----------------------------------+---
... |      |      |   +   |      |      | ...
 ---+-----------------|-----------------+---
                      |
              +---------------+
              | "abc" | "xyz" |
              +---------------+
                      |
              +---------------+
              | "123" | "pqr" |
              +---------------+

Notice that we know have to explicitly store the key along with the value. Now, if we wanted to query with the key "123" we would go to its hash location (42) and traverse the linked list found there until we find one with the key "123". We would then return the corresponding value, "pqr".

At this point you might have two questions:

  • How does the hash() function operate in O(1)?
  • If we need to traverse a linked list, how can the entire operation be O(1)?

As for the first question, the hashing process (while perhaps not actually a constant time operation) is generally not taken into account when talking about the complexity of a hash table, simply because it is assumed to not be very time consuming when compared to other subsequent processes. In fact, in many cases hashing actually is constant. For instance, since strings are immutable in many languages, their hash values are often only computed once and then cached, resulting in constant time hashing after the first hash operation.

As for the second question, when we have a good hash function and a reasonably sized table, the linked lists that form should be very short (presumably no more than 3 in length). For this reason, the traversal process is considered to be constant time.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • How do you find hash("abc") = 42 in O(1)? i want to know how hash finds the key directly. How it is actually stored internally? – user1322495 Oct 06 '13 at 12:43
  • @user1322495 That's a good question. Generally the cost of the hash function is not taken into account when determining the complexity. We assume that the hashing process (while perhaps not *actually* constant time) is not very time consuming when compared to the other hash table operations, so we just consider it to be constant time. As for how it's stored internally, it's an array as I've described. – arshajii Oct 06 '13 at 12:46
  • @user1322495 I've edited the answer to reflect this. – arshajii Oct 06 '13 at 12:59
  • @user1322495 usually there is some constant process to map the string abc to 42, something like may be adding their ascii's and producing a number index. But the point is it is some constant time calculation. There is never a guarantee that there will be no collisions in which case there will be additional overhead, but then the AVERAGE case remains O(1). Please see my answer – fkl Oct 06 '13 at 13:17
  • @arshajii, respectfully i would say it is not something 'generally not taken into account' since key based retrieval is the core of hash tables. It is true that the retrieval is not O(1) in every case, but it is O(1) in most of the cases actually. Otherwise, we have many collisions and hence a bad hash function. Please see my answer too – fkl Oct 06 '13 at 13:25
  • @fayyazkl That statement refers to the complexity of the hash function itself when talking about the complexity of the overall hash table. We usually assume that the hash function is not very expensive (O(1)). I agree with you that retrieval is O(1) in most cases, that's the point I was trying to make. – arshajii Oct 06 '13 at 13:31
1

The "hash" in the name is a function which basically turns the key into an (ideally) unique index for that key. In practice, each hash is a "bucket" which may contain multiple values, to allow for collisions.

See also http://en.wikipedia.org/wiki/Hash_function

tripleee
  • 175,061
  • 34
  • 275
  • 318
0

Although both earlier answers are correct (and i up voted both), i felt like adding some insight.

First, hash table is an abstract data type, meaning there can be many data structures and retrieval algorithms that can be used in a particular implementation. Arrays, Binary search trees, dictionaries etc. are just examples of some possible implementations.

Secondly, the important point is that the Average case retrieval for accessing the value of a key is constant time i.e. O(1), not the worst case.

So the key maps into an ideally unique storage location. However, there are always collisions possible in a real scenario, and collisions are handled by storing those multiple values some how (say maintaining a link list, tree or yet another second level hash).

The point is, for a good hash function, collisions are quire rare compared to the normal constant time index access. Hence the phrase AVERAGE CASE.

Retrieving value against a key in the worst case in a hash function will never be O(1). It can be log(n) or even worst than that. But it's frequency of occurrence is so small, for every good hash function, that the average case complexity still remains at O(1) i.e. constant time.

fkl
  • 5,412
  • 4
  • 28
  • 68