I have this array, in a ruby 1.8.6 console:
arr = [{:foo => "bar"}, {:foo => "bar"}]
both elements are equal to each other:
arr[0] == arr[1]
=> true
#just in case there's some "==" vs "===" oddness...
arr[0] === arr[1]
=> true
But, arr.uniq doesn't remove the duplicates:
arr.uniq
=> [{:foo=>"bar"}, {:foo=>"bar"}]
Can anyone tell me what's going on here?
EDIT: I can write a not very clever uniqifier which uses include?
as follows:
uniqed = []
arr.each do |hash|
unless uniqed.include?(hash)
uniqed << hash
end
end;false
uniqed
=> [{:foo=>"bar"}]
This produces the correct result, which makes the failure of uniq
even more mysterious.
EDIT 2: Some notes on what's going on, possibly just for my own clarity. As @Ajedi32 points out in the comments, the failure to uniqify comes from the fact that the two elements are different objects. Some classes define eql?
and hash
methods, used for comparison, to mean "are these effectively the same thing, even if they're not the same object in memory". String does this for example, which is why you can define two variables to be "foo" and they are said to be equal to one another, even though they're not the same object.
The Hash class doesn't do this, in Ruby 1.8.6, and so when .eql?
and .hash
are called on a hash object (the .hash method has nothing to do with the Hash data type - it's like the checksum kind of hash) it falls back to using the methods defined in the Object base class, which simply say "Is it the same object in memory".
The ==
and ===
operators, for hash objects, already do what I want, ie to say that two hashes are the same if their contents are the same. I've overriden Hash#eql?
to use these, like so:
class Hash
def eql?(other_hash)
self == other_hash
end
end
But, I'm not sure how to handle Hash#hash
: that is, I don't know how to generate a checksum which will be the same for two hashes whose contents are the same and always different for two hashes with different contents.
@Ajedi32 suggested I have a look at Rubinius' implentation of the Hash#hash
method here https://github.com/rubinius/rubinius/blob/master/core/hash.rb#L589 , and my version of Rubinius' implementation looks like this:
class Hash
def hash
result = self.size
self.each do |key,value|
result ^= key.hash
result ^= value.hash
end
return result
end
end
and this does seem to work, although I don't know what the "^=" operator does, which makes me a bit nervous. Also, it's very slow - about 50x as slow based on some primitive benchmarking. This might make it too slow to use.
EDIT 3: A bit of research has revealed that "^" is the Bitwise Exclusive OR operator. When we have two inputs, an XOR returns 1 if the inputs are different (ie it returns 0 for 0,0 and 1,1 and 1 for 0,1 and 1,0).
So, at first I thought that means that
result ^= key.hash
is shorthand for
result = result ^ key.hash
In other words, do an XOR between the current value of result and the other thing, and then save that in result. I still don't quite get the logic of this though. I thought that perhaps the ^ operator was something to do with pointers, because calling it on variables works while calling it on the value of the variable doesn't work: eg
var = 1
=> 1
var ^= :foo
=> 14904
1 ^= :foo
SyntaxError: compile error
(irb):11: syntax error, unexpected tOP_ASGN, expecting $end
So, it's fine with calling ^= on a variable but not the value of the variable, which made me think it's something to do with referencing/dereferencing.
Later implementations of Ruby also have C code for the Hash#hash method, and Rubinius' implementaion seems too slow. Bit stuck...