2

I am reading Data Structures and Algorithms & Software Principles in C to try to wrap my head around some internals of data structures, and two things are really bothering me:

(1) How do hash tables deal with deciding which item in the bucket is the item you are looking up if they all have the same hash?

e.g.

  1. Get Key, Value
  2. use Hash algorithm on the key to find the index to try to put value into
  3. if the slot is taken, but there is no bucket(single entry), create a bucket and throw the current item into the bucket and then throw the current value into it.
  4. now I have a bucket with a bunch of values and a "lost and found problem" where you can't tell which value belongs to which key because all the keys map to the same hash and the item in the bucket has no key to search the bucket by key.

This would work if the bucket saves keys as well as values for each entry, but I am confused since I can't find a site that confirms that hash tables save keys along with the values for their entries.

(2) How do hash tables tell if the value at an index is the correct value for the key, or if probing found a collision and put it elsewhere.

eg.

  1. Get Key, Value
  2. hash key to find index(0)
  3. index taken, use a naive probe algorithm of perform linear search until slot found(slot 1 is empty).
  4. now I search for my key and find index 0. How does the hash know that index 0 is not the correct item for this key, but that it has been probed into slot 1?

Again, this would make sense to me if the table saved a key as well as value for the entry, but I am not sure if hashes save keys along with values for the entries or have another way of ensuring that the item at the hash index or bucket index is the correct item, or if I am misunderstanding it.

To clarify the question: do hash tables save key along with value to disambiguate buckets and probe sequences or do they use something else to avoid ambiguity of hashes?

Sorry for the crudely formulated question but I just had to ask.

Thanks ahead of time.

Dmytro
  • 5,068
  • 4
  • 39
  • 50
  • "the item in the bucket has no key to search the bucket by key" - why doesn't your hash table store keys? – user2357112 Jul 15 '16 at 23:58
  • @user2357112 i'm not sure if it's necessary. I need confirmation, I have doubts whether they do or don't. – Dmytro Jul 16 '16 at 00:00
  • @Dmitry: depends on the use case. In general, the entry is stored. Otherwise, collisions will need to be either returned to user without saving or be overwritten over the previous entry. – displayName Jul 16 '16 at 00:02

1 Answers1

2

Hash Tables save the entry. An entry consists of key and value.

How do hash tables deal with deciding which item in the bucket is the item you are looking up if they all have the same hash?

Because query is done by passing the key.

Purpose of hashing is to reduce the time to find the index. They key is hashed to find the right bucket. Then, when the items have been reduced from a total N to a very small n, you can even perform a linear search to find the right item out of all the keys having the same hash.

How do hash tables tell if the value at an index is the correct value for the key, or if probing found a collision and put it elsewhere.

Again, that's because Hash Table would save entries instead of just the value. If, in case of a collision, the Hash Table sees that the key found at this bucket is not the key that's queried, the Hash Table knows that the collision occurred earlier and the key may be in the next bucket. Please note that in this case the bucket stores a single entry unlike the case of first answer where the bucket may store a LinkedList or a Tree of entries.

displayName
  • 13,888
  • 8
  • 60
  • 75
  • Ok that makes sense. I had trouble believing hash entries stored keys as well as values since I always saw hash tables as one way maps from key to value, rather than key to an entry consisting of key and value. Having a key guarantees all entries can be mapped back to the key assuming there is no attempts to store multiple different values under the same key, which is easy to detect and ignore such insertion requests and report the attempt. Thanks for confirming that keys are indeed required to be stored inside hash table entries in unless it's okay have cases that corrupt hash table. – Dmytro Jul 16 '16 at 00:06
  • 1
    @Dmitry: Not only keys are also stored, the keys should **never** be modified after an entry has been inserted because now this entry with the modified key is not the correct bucket and hence it cannot be found again when searched. – displayName Jul 16 '16 at 00:14
  • isn't it possible to remap keys as long as you create a whole new hash table with a hash algorithm that generates such keys and pass all the entries's keys through the new algorithm to find the new values, thus maintaining the hash table invariants? Interesting that this would also be impossible without knowing the key inside the hash table entries. – Dmytro Jul 16 '16 at 00:20
  • @Dmitry: As long as you rehash and add the entry to the correct index, you can do whatever you want. – displayName Jul 17 '16 at 22:06