5

I have a file with numbers on each line:

0101
1010
1311
0101
1311
431
1010
431
420

I want have a hash with the number of occurrences of each number, in this case:

{0101 => 2, 1010 => 2, 1311 => 2, 431 => 2, 420 => 1}

How can I do this?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
josh
  • 261
  • 4
  • 9
  • 1
    I think I found the same question with different wording :) [count duplicate elements in ruby array](http://stackoverflow.com/questions/569694/count-duplicate-elements-in-ruby-array) – Matchu Nov 29 '10 at 01:17
  • FYI: If it happens to be Rails, you can use Enumerable#group_by. See http://api.rubyonrails.org/classes/Enumerable.html#method-i-group_by – Mark Thomas Nov 29 '10 at 16:36
  • 2
    Enumerable#group_by is not just for Rails: It's in Ruby 1.9, and has been backported to 1.8.7 – Wayne Conrad Nov 29 '10 at 17:02
  • 2
    `0101` isn't a number, it's a string. – Andrew Grimm May 02 '11 at 23:53

3 Answers3

11

Simple one-liner, given an array items:

items.inject(Hash.new(0)) {|hash, item| hash[item] += 1; hash}

How it works:

Hash.new(0) creates a new Hash where accessing undefined keys returns 0.

inject(foo) iterates through an array with the given block. For the first iteration, it passes foo, and on further iterations, it passes the return value of the last iteration.

Another way to write it would be:

hash = Hash.new(0)
items.each {|item| hash[item] += 1}
Chuck
  • 234,037
  • 30
  • 302
  • 389
  • 1
    I tend to do it the second way @Chuck showed because then I don't have to explain it, though I prefer using `inject()`. – the Tin Man Nov 29 '10 at 02:32
  • The each_with_object should now be the preferred, cleaner, solution - very easy to forget ; hash. – notapatch Sep 02 '15 at 11:09
4

This is essentially the same as Chuck's, but when you are creating an array or hash, 'each_with_object' will make it slightly simpler than 'inject', as you do not have to write the final array or hash in the block.

items.each_with_object(Hash.new(0)) {|item, hash| hash[item] += 1}
sawa
  • 165,429
  • 45
  • 277
  • 381
2
ID = -> x { x } # Why is the identity function not in the core lib?

f = <<-HERE
  0101
  1010
  1311
  0101
  1311
  431
  1010
  431
  420
HERE

Hash[f.lines.map(&:to_i).group_by(&ID).map {|n, ns| [n, ns.size] }]
# { 101 => 2, 1010 => 2, 1311 => 2, 431 => 2, 420 => 1 }

You simply group the numbers by themselves using Enumerable#group_by, which gives you something like

{ 101 => [101, 101], 420 => [420] }

And then you Enumerable#map the value arrays to their lengths, i.e. [101, 101] becomes 2. Then just convert it back to a Hash using Hash::[].

However, if you are willing to use a third-party library, it becomes even more trivial, because if you use a MultiSet data structure, the answer falls out naturally. (A MultiSet is like a Set, except that an item can be added multiple times and the MultiSet will keep count of how often an item was added – which is exactly what you want.)

require 'multiset' # Google for it, it's so old that it isn't available as a Gem

Multiset[*f.lines.map(&:to_i)]
# => #<Multiset:#2 101, #2 1010, #2 1311, #2 431, #1 420>

Yes, that's it.

That's the beautiful thing about using the right data-structure: your algorithms become massively simpler. Or, in this particular case, the algorithm just vanishes.

I've written more about using MultiSets for solving this exact problem at

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • "Why is the identity function not in the core lib?" - because neither of us has suggested it should be in the core lib? – Andrew Grimm Mar 09 '11 at 22:29
  • Maybe we could ask `tap` to accept the absence of a block. http://stackoverflow.com/questions/6308470/ruby-method-that-returns-itself/6309488#6309488 – Andrew Grimm Jun 14 '11 at 00:45