0

Can somebody please explain the concept of buckets simply to me. I understand a Dict is an array of arrays, I cannot for the life of me make sense of this first block of code though and can't find anything online that explains num_buckets. If you could explain it line by line that would be great.

module Dict
  def Dict.new(num_buckets=256)
  # Initializes a Dict with the given number of buckets.
  aDict = []
  (0...num_buckets).each do |i|
    aDict.push([])
  end

  return aDict
end
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
mav91
  • 165
  • 2
  • 14
  • The creation and initialization of `aDict` could have been written, `aDict = Array.new(num_buckets) {[]}`, in which case the `return` would not have been needed. – Cary Swoveland Jun 20 '15 at 19:06

1 Answers1

2

The code is meant to implement a data structure called Hash table. It is the data structure of Ruby's built-in Hash class.

Hash tables use the hashing of keys as indexes. Because there are limited number of possible indexes, collision (i.e, different keys have the same hashing) happens. Separate chaining is one common method for collision resolution. Keys are inserted into buckets. num_buckets here is the number of buckets. Different keys with the same hashing are in the same bucket.

An image illuatrating separate chaining from Wikipedia:

enter image description here

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • Yu, not having a computer science background, this is something I didn't know. Interesting. What happens when a bucket becomes full? (Will buckets have buckets?) I'm not sure the OP understands the module is not to be mixed into a class or how the module method is used (e.g., `Dict.new(512)`, I presume). – Cary Swoveland Jun 20 '15 at 19:19
  • Thanks Yu this was very useful! – mav91 Jun 20 '15 at 19:39
  • @CarySwoveland With this *separate chaining* implementation, a bucket is never really *full* as each bucket is a linked list. However, if the keys are too many, the performance of hash table becomes poor, then **rehashing** happens, for instance, 512 buckets becomes 1024 buckets, and every key will be calculated their new hashing value according to the new bucket number. Withe the other common *open addressing* implementation, the bucket array might be full, so rehashing must be done before that. – Yu Hao Jun 21 '15 at 00:38
  • Thanks for the explanation, Yu. – Cary Swoveland Jun 21 '15 at 07:18