125

I have the following Array = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

How do I produce a count for each identical element?

Where:
"Jason" = 2, "Judah" = 3, "Allison" = 1, "Teresa" = 1, "Michelle" = 1?

or produce a hash Where:

Where: hash = { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

user398520
  • 1,553
  • 3
  • 12
  • 11
  • 8
    As of Ruby 2.7 you can use `Enumerable#tally`. More info [here](https://stackoverflow.com/questions/5128200/how-to-count-identical-string-elements-in-a-ruby-array/56493588#answer-56493588). – SRack Jun 07 '19 at 11:37

15 Answers15

186

Ruby v2.7+ (latest)

As of ruby v2.7.0 (released December 2019), the core language now includes Enumerable#tally - a new method, designed specifically for this problem:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

names.tally
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.4+ (currently supported, but older)

The following code was not possible in standard ruby when this question was first asked (February 2011), as it uses:

These modern additions to Ruby enable the following implementation:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

names.group_by(&:itself).transform_values(&:count)
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.2+ (deprecated)

If using an older ruby version, without access to the above mentioned Hash#transform_values method, you could instead use Array#to_h, which was added to Ruby v2.1.0 (released December 2013):

names.group_by(&:itself).map { |k,v| [k, v.length] }.to_h
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

For even older ruby versions (<= 2.1), there are several ways to solve this, but (in my opinion) there is no clear-cut "best" way. See the other answers to this post.

Tom Lord
  • 27,404
  • 4
  • 50
  • 77
  • I was about to post :P. Is there any discernible difference between using `count` instead of `size`/`length`? – Sagar Pandya Jan 02 '18 at 02:25
  • 1
    @SagarPandya No, there is no difference. Unlike `Array#size` and `Array#length`, `Array#count` *can* take an optional argument or block; but if used with neither then its implementation is identical. More specifically, all three methods call `LONG2NUM(RARRAY_LEN(ary))` under the hood: [count](https://github.com/ruby/ruby/blob/4769e4b76c0dcdb4ea709c48d3d9d8ab496d209d/array.c#L4594) / [length](https://github.com/ruby/ruby/blob/4769e4b76c0dcdb4ea709c48d3d9d8ab496d209d/array.c#L1917-L1918) – Tom Lord Jan 02 '18 at 11:29
  • 1
    Extra credit! Sort by count `.group_by(&:itself).transform_values(&:count).sort_by{|k, v| v}.reverse` – Abram Jun 19 '19 at 15:17
  • 2
    @Abram you can `sort_by{ |k, v| -v}`, no `reverse` needed! ;-) – Sony Santos Feb 09 '20 at 12:07
133
names.inject(Hash.new(0)) { |total, e| total[e] += 1 ;total}

gives you

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1} 
Mauricio
  • 5,854
  • 2
  • 28
  • 34
95
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = Hash.new(0)
names.each { |name| counts[name] += 1 }
# => {"Jason" => 2, "Teresa" => 1, ....
Dylan Markow
  • 123,080
  • 26
  • 284
  • 201
28

Now using Ruby 2.2.0 you can leverage the itself method.

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = {}
names.group_by(&:itself).each { |k,v| counts[k] = v.length }
# counts > {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
notapatch
  • 6,569
  • 6
  • 41
  • 45
Ahmed Fahmy
  • 488
  • 5
  • 10
  • 3
    Agree, but I slightly prefer names.group_by(&:itself).map{|k,v| [k, v.count]}.to_h so that you don't have to ever declare a hash object – Andy Day Apr 13 '17 at 17:52
  • 9
    @andrewkday Taking this one step further, ruby v2.4 added the method: `Hash#transform_values` which allows us to simplify your code even more: `names.group_by(&:itself).transform_values(&:count)` – Tom Lord Jan 01 '18 at 22:16
  • Also, this is a very subtle point (which is likely no longer relevant to future readers!), but note that your code also uses [`Array#to_h`](https://ruby-doc.org/core/Array.html#method-i-to_h) - which was added to Ruby v2.1.0 (released December 2013 - i.e. almost 3 years after the original question was asked!) – Tom Lord Jan 01 '18 at 22:33
23

Ruby 2.7+

Ruby 2.7 is introducing Enumerable#tally for this exact purpose. There's a good summary here.

In this use case:

array.tally
# => { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

Docs on the features being released are here.

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
SRack
  • 11,495
  • 5
  • 47
  • 60
17

There's actually a data structure which does this: MultiSet.

Unfortunately, there is no MultiSet implementation in the Ruby core library or standard library, but there are a couple of implementations floating around the web.

This is a great example of how the choice of a data structure can simplify an algorithm. In fact, in this particular example, the algorithm even completely goes away. It's literally just:

Multiset.new(*names)

And that's it. Example, using https://GitHub.Com/Josh/Multimap/:

require 'multiset'

names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]

histogram = Multiset.new(*names)
# => #<Multiset: {"Jason", "Jason", "Teresa", "Judah", "Judah", "Judah", "Michelle", "Allison"}>

histogram.multiplicity('Judah')
# => 3

Example, using http://maraigue.hhiro.net/multiset/index-en.php:

require 'multiset'

names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]

histogram = Multiset[*names]
# => #<Multiset:#2 'Jason', #1 'Teresa', #3 'Judah', #1 'Michelle', #1 'Allison'>
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • Does the MultiSet concept originate from mathematics, or another programming language? – Andrew Grimm Feb 27 '11 at 22:02
  • 2
    @Andrew Grimm: Both he *word* "multiset" (de Bruijn, 1970s) and the *concept* (Dedekind 1888) originated in mathematics. `Multiset` is governed by strict mathematical rules and supports the typical set operations (union, intersection, complement, ...) in a way that is *mostly* consistent with the axioms, laws and theorems of "normal" mathematical set theory, although some important laws do *not* hold when you try to generalize them to multisets. But that's way beyond my understanding of the matter. I use them as a programming data structure, not as a mathematical concept. – Jörg W Mittag Feb 28 '11 at 03:26
  • To expand a *little* on that point: **"... in a way that is mostly consistent with the axioms ..."**: "Normal" sets are usually formally defined by a set of axioms (assumptions) called "Zermelo-Frankel set theory". However, one of these axioms: the [axiom of extensionality](https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory#1._Axiom_of_extensionality) states that a set is defined precisely by its members - e.g. `{A, A, B} = {A, B}`. This is clearly a violation of the very definition of multi-sets! – Tom Lord Jan 01 '18 at 23:46
  • ...However, without going into too much detail (as this is a software forum, not advanced maths!), one *can* formally define multi-sets mathematically via axioms for Crisp sets, the Peano axioms and other MultiSet-specific axioms. – Tom Lord Jan 01 '18 at 23:49
14

Enumberable#each_with_object saves you from returning the final hash.

names.each_with_object(Hash.new(0)) { |name, hash| hash[name] += 1 }

Returns:

=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Lev Lukomsky
  • 6,346
  • 4
  • 34
  • 24
Anconia
  • 3,888
  • 6
  • 36
  • 65
6

This works.

arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
result = {}
arr.uniq.each{|element| result[element] = arr.count(element)}
Shreyas
  • 8,737
  • 7
  • 44
  • 56
  • 2
    +1 For a different approach -- although this has worse theoretical complexity -- `O(n^2)` (which will matter for some values of `n`) *and* does extra work (it has to count for "Judah" 3x, for instance)!. I would also suggest `each` instead of `map` (the map result is being discarded) –  Feb 26 '11 at 18:44
  • Thanks for that! I've changed the map to each.Also, I've uniq'ed the array before going through it. Maybe now the complexity issue is solved? – Shreyas Feb 27 '11 at 06:29
6

The following is a slightly more functional programming style:

array_with_lower_case_a = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
hash_grouped_by_name = array_with_lower_case_a.group_by {|name| name}
hash_grouped_by_name.map{|name, names| [name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]

One advantage of group_by is that you can use it to group equivalent but not exactly identical items:

another_array_with_lower_case_a = ["Jason", "jason", "Teresa", "Judah", "Michelle", "Judah Ben-Hur", "JUDAH", "Allison"]
hash_grouped_by_first_name = another_array_with_lower_case_a.group_by {|name| name.split(" ").first.capitalize}
hash_grouped_by_first_name.map{|first_name, names| [first_name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]
Andrew Grimm
  • 78,473
  • 57
  • 200
  • 338
  • Did I hear functional programming? +1 :-) This is definitely the best way, although it can be argued that is not memory-efficient. Notice also that Facets has a Enumerable#frequency. – tokland Oct 24 '11 at 10:20
5
names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
Hash[names.group_by{|i| i }.map{|k,v| [k,v.size]}]
# => {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317
5
a = [1, 2, 3, 2, 5, 6, 7, 5, 5]
a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }

# => {1=>1, 2=>2, 3=>1, 5=>3, 6=>1, 7=>1}

Credit Frank Wambutt

narzero
  • 2,199
  • 5
  • 40
  • 73
2

Lots of great implementations here.

But as a beginner I would consider this the easiest to read and implement

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

name_frequency_hash = {}

names.each do |name|
  count = names.count(name)
  name_frequency_hash[name] = count  
end
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

The steps we took:

  • we created the hash
  • we looped over the names array
  • we counted how many times each name appeared in the names array
  • we created a key using the name and a value using the count

It may be slightly more verbose (and performance wise you will be doing some unnecessary work with overriding keys), but in my opinion easier to read and understand for what you want to achieve

Sami Birnbaum
  • 773
  • 8
  • 20
  • 2
    I don't see how that's any easier to read than the accepted answer, and it's clearly a worse design (doing lots of unnecessary work). – Tom Lord Apr 18 '18 at 17:00
  • @Tom Lord - I do agree with you on performance (I even mentioned that in my answer) - but as a beginner trying to understand the actual code and steps required, I find it helps to be more verbose and then one can refactor to improve performance and make code more declarative – Sami Birnbaum May 29 '19 at 09:07
  • 1
    I agree somewhat with @SamiBirnbaum. This is the only one that uses almost no special ruby knowledge like `Hash.new(0)`. The closest to pseudocode. That can be a good thing for readability but also doing unnecessary work can harm readability for readers who notice it because in more complex cases they will spend a little time thinking they're going crazy trying to figure out why it's done. – Adamantish Jun 07 '19 at 09:50
2

With ruby 2.6 you can do:

names.to_h{ |name| [name, names.count(name)] }

gives you:

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Akash Kinwad
  • 704
  • 2
  • 7
  • 22
1
arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

arr.uniq.inject({}) {|a, e| a.merge({e => arr.count(e)})}

Time elapsed 0.028 milliseconds

interestingly, stupidgeek's implementation benchmarked:

Time elapsed 0.041 milliseconds

and the winning answer:

Time elapsed 0.011 milliseconds

:)

jszobody
  • 28,495
  • 6
  • 61
  • 72
Alex Moore-Niemi
  • 2,913
  • 2
  • 24
  • 22
1

This is more a comment than an answer, but a comment wouldn't do it justice. If you do Array = foo, you crash at least one implementation of IRB:

C:\Documents and Settings\a.grimm>irb
irb(main):001:0> Array = nil
(irb):1: warning: already initialized constant Array
=> nil
C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3177:in `rl_redisplay': undefined method `new' for nil:NilClass (NoMethodError)
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3873:in `readline_internal_setup'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4704:in `readline_internal'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline'
        from C:/Ruby19/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:139:in `block (2 levels) in eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:271:in `signal_status'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:138:in `block in eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc'
        from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io'
        from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:75:in `match'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:153:in `eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:70:in `block in start'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `catch'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `start'
        from C:/Ruby19/bin/irb:12:in `<main>'

C:\Documents and Settings\a.grimm>

That's because Array is a class.

Andrew Grimm
  • 78,473
  • 57
  • 200
  • 338