4

I wonder how it is possible to create a consistent hash of a ruby array full of strings. The requirements are that the hash is always the same if the array contains the same values, independent of their order.

>> a = ["a", "b", "c", "d"]
>> SomeModule.hash(a)
=> "2aae6c35c94fcfb415dbe95f408b9ce91ee846ed"
>>
>> b = ["d", "b", "c", "a"]
>> SomeModule.hash(b)
=> "2aae6c35c94fcfb415dbe95f408b9ce91ee846ed"
>>
>> SomeModule.hash(a) == SomeModule.hash(b)
=> true

Zlib or digest only do strings, but I had to always sort the array and join it to get that working.

So is there anything better?

Thomas Fankhauser
  • 5,039
  • 1
  • 33
  • 32

4 Answers4

5

You can convert your array to Set and call to_set method (don't foreget to `require 'set')

a = ["a", "b", "c", "d"]
a.to_set.hash # => 425494174200536878

b = ["d", "b", "c", "a"]
b.to_set.hash # => 425494174200536878
Vasiliy Ermolovich
  • 24,459
  • 5
  • 79
  • 77
3

You can just sort the array, concatenate all elements to a string and hash it.

def hash(array)
   Digest::SHA1.digest(array.join)
end
iltempo
  • 15,718
  • 8
  • 61
  • 72
  • I see you last sentence now. But I'm not sure if you would come around sorting if the order of the elements should not matter. – iltempo Oct 19 '12 at 08:28
  • 1
    If you convert your array to a set you are going to lose duplicate elements http://www.ruby-doc.org/stdlib-1.9.3/libdoc/set/rdoc/Set.html. Don't know if this is what you want. – iltempo Oct 19 '12 at 08:38
  • That's - in my case - even desired behavior, so that would be totally acceptable. – Thomas Fankhauser Oct 19 '12 at 09:47
  • Seems, as with sort this is the only really "consistent" way how to do this.. so thanks! – Thomas Fankhauser Oct 22 '12 at 02:35
0

There is already a standard library called set that introduces the Set class. You can also easily implement it by yourself. Instead of an array like this:

["a", "b", "c", "d"]

keep it as a hash:

{"a" => true, "b" => true, "c" => true, "d" => true}
sawa
  • 165,429
  • 45
  • 277
  • 381
0

It seems the stateless aspect of .hash is being overlooked. Running the following respects time sensitivity of this issue:

a = %w(alpha bravo charlie delta echo foxtrot gulf hotel india july kilo sheep)
c = %w(alpha bravo charlie delta echo foxtrot gulf hotel india july kilo sheep)

require 'ap'

ap a
ap c

def h2(value)
    Digest::SHA512.hexdigest value.inspect
end

aa = h2(a)
cc = h2(c)
ap aa
ap cc

ap "they are equal" if aa==cc
puts

t = Turtle.new("Tom", "blue")  # an object with two ivars, name and colour
ap h2(t)

t.name = "Teth"
ap h2(t)

t.name = "Tom"
ap h2(t)
puts

t.colour = "ruby red"
ap h2(t)

t.colour = "blue"
ap h2(t)
Rich_F
  • 1,830
  • 3
  • 24
  • 45