2

In the version of Ruby i'm using, (1.8.6 - don't ask), the Hash class doesn't define the Hash#hash method, which means that calling uniq on an array of hashes doesn't test whether the content is the same - it tests whether the objects are the same (using the default base Object#hash method).

To get around this, I can use include?, like so:

hashes = <a big list of hashes>
uniq_hashes = []
hashes.each do |hash|
  unless uniq_hashes.include?(hash)
    uniq_hashes << hash
  end
end;uniq_hashes.size

Can anyone think of a way to condense this into a one-line method?

Max Williams
  • 32,435
  • 31
  • 130
  • 197

3 Answers3

2

Can you use each_with_object?

hashes = [{title: 'a'}, {title: 'b'}, {title: 'c'}, {title: 'a'}]
p hashes.each_with_object([]) { |el, array| array << el unless array.include? el }.size
# 3
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
  • That's great, I'd not seen `each_with_object` before - handy! – Max Williams Nov 15 '17 at 11:52
  • Sorry, I just took the correct status off this and gave it to @CarySwoveland since his answer (not stricly one line, but easily changed to one line) runs in a tiny fraction of the time, for large arrays. Sorry if that's a breach of SE protocol! – Max Williams Nov 16 '17 at 09:11
1
hashes = <a big list of hashes>
uniq_hashes = []
hashes.each do |hash|
  unless uniq_hashes.include?(hash)
    uniq_hashes << hash
  end
end;uniq_hashes.size

Can anyone think of a way to condense this into a one-line method?

Easy:

hashes = <a big list of hashes>; uniq_hashes = []; hashes.each do |hash| unless uniq_hashes.include?(hash) then uniq_hashes << hash end end;uniq_hashes.size

In fact, you can always condense any Ruby code into one line, since newlines are completely optional. Newlines can always be replaced with either semicolons, separator keywords, or just nothing.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
1

Rather than using include? to check if each hash matches a previously-examined hash, one can speed things up by making use of a set. Recall that a set is implemented with a hash under the covers, which explains why lookups are so fast.

require 'set'

def uniq_hashes(arr)
  st = Set.new
  arr.select { |h| st.add?(h) }
end

uniq_hashes [{ a: 1, b: 2 }, { b: 2, a: 1 }, { a: 1, c: 2 }]
  #=> [{:a=>1, :b=>2}, {:a=>1, :c=>2}]

See Set#add?.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Wow, I just benchmarked that with an array of 88,000 large hashes, and it's MUCH faster than the answer I marked as correct already by @SebastiánPalma. That's to be expected I guess since the number of ops in the `include?` approach will grow exponentially with the array size. I've never really used sets before, i'll look into them a bit. thanks a lot. – Max Williams Nov 16 '17 at 09:10