How to count identical string elements in a Ruby array

Question

I have the following Array = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

How do I produce a count for each identical element?

Where:
"Jason" = 2, "Judah" = 3, "Allison" = 1, "Teresa" = 1, "Michelle" = 1?

or produce a hash Where:

Where: hash = { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

As of Ruby 2.7 you can use `Enumerable#tally`. More info [here](https://stackoverflow.com/questions/5128200/how-to-count-identical-string-elements-in-a-ruby-array/56493588#answer-56493588). — SRack, Jun 07 '19 at 11:37

Tom Lord · Answer 1 · 2019-12-27T19:05:59.833

Ruby v2.7+ (latest)

As of ruby v2.7.0 (released December 2019), the core language now includes Enumerable#tally - a new method, designed specifically for this problem:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

names.tally
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.4+ (currently supported, but older)

The following code was not possible in standard ruby when this question was first asked (February 2011), as it uses:

Object#itself, which was added to Ruby v2.2.0 (released December 2014).
Hash#transform_values, which was added to Ruby v2.4.0 (released December 2016).

These modern additions to Ruby enable the following implementation:

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

names.group_by(&:itself).transform_values(&:count)
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

Ruby v2.2+ (deprecated)

If using an older ruby version, without access to the above mentioned Hash#transform_values method, you could instead use Array#to_h, which was added to Ruby v2.1.0 (released December 2013):

names.group_by(&:itself).map { |k,v| [k, v.length] }.to_h
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

For even older ruby versions (<= 2.1), there are several ways to solve this, but (in my opinion) there is no clear-cut "best" way. See the other answers to this post.

I was about to post :P. Is there any discernible difference between using `count` instead of `size`/`length`? — Sagar Pandya, Jan 02 '18 at 02:25
@SagarPandya No, there is no difference. Unlike `Array#size` and `Array#length`, `Array#count` *can* take an optional argument or block; but if used with neither then its implementation is identical. More specifically, all three methods call `LONG2NUM(RARRAY_LEN(ary))` under the hood: [count](https://github.com/ruby/ruby/blob/4769e4b76c0dcdb4ea709c48d3d9d8ab496d209d/array.c#L4594) / [length](https://github.com/ruby/ruby/blob/4769e4b76c0dcdb4ea709c48d3d9d8ab496d209d/array.c#L1917-L1918) — Tom Lord, Jan 02 '18 at 11:29
Extra credit! Sort by count `.group_by(&:itself).transform_values(&:count).sort_by{|k, v| v}.reverse` — Abram, Jun 19 '19 at 15:17
@Abram you can `sort_by{ |k, v| -v}`, no `reverse` needed! ;-) — Sony Santos, Feb 09 '20 at 12:07

score 133 · Answer 2 · answered Feb 26 '11 at 16:55

133

names.inject(Hash.new(0)) { |total, e| total[e] += 1 ;total}

gives you

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

answered Feb 26 '11 at 16:55

Mauricio

5,854
2
28
34

3

+1 Like the selected answer, but I prefer the use of inject and no "external" variable. – Feb 26 '11 at 18:40
18

If you use `each_with_object` instead of `inject` you don't have to return (`;total`) at the block. – mfilej Sep 24 '13 at 10:14
15

For posterity, this is what @mfilej means: `array.each_with_object(Hash.new(0)){|string, hash| hash[string] += 1}` – Gon Zifroni May 25 '15 at 20:58
7

From Ruby 2.7, you can simply do: `names.tally`. – Hallgeir Wilhelmsen Feb 15 '19 at 22:14

score 95 · Accepted Answer · answered Feb 26 '11 at 16:44

95

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = Hash.new(0)
names.each { |name| counts[name] += 1 }
# => {"Jason" => 2, "Teresa" => 1, ....

answered Feb 26 '11 at 16:44

Dylan Markow

123,080
26
284
201

score 28 · Answer 4 · edited Apr 14 '16 at 18:18

28

Now using Ruby 2.2.0 you can leverage the itself method.

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
counts = {}
names.group_by(&:itself).each { |k,v| counts[k] = v.length }
# counts > {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

edited Apr 14 '16 at 18:18

notapatch

6,569
6
41
45

answered Jan 02 '16 at 12:02

Ahmed Fahmy

488
5
10

3

Agree, but I slightly prefer names.group_by(&:itself).map{|k,v| [k, v.count]}.to_h so that you don't have to ever declare a hash object – Andy Day Apr 13 '17 at 17:52
9

@andrewkday Taking this one step further, ruby v2.4 added the method: `Hash#transform_values` which allows us to simplify your code even more: `names.group_by(&:itself).transform_values(&:count)` – Tom Lord Jan 01 '18 at 22:16
Also, this is a very subtle point (which is likely no longer relevant to future readers!), but note that your code also uses [`Array#to_h`](https://ruby-doc.org/core/Array.html#method-i-to_h) - which was added to Ruby v2.1.0 (released December 2013 - i.e. almost 3 years after the original question was asked!) – Tom Lord Jan 01 '18 at 22:33

score 23 · Answer 5 · edited Dec 03 '21 at 11:35

23

Ruby 2.7+

Ruby 2.7 is introducing Enumerable#tally for this exact purpose. There's a good summary here.

In this use case:

array.tally
# => { "Jason" => 2, "Judah" => 3, "Allison" => 1, "Teresa" => 1, "Michelle" => 1 }

Docs on the features being released are here.

edited Dec 03 '21 at 11:35

Tomerikoo

18,379
16
47
61

answered Jun 07 '19 at 11:36

SRack

11,495
5
47
60

Fantastic news! – tadman Sep 19 '19 at 17:09

score 17 · Answer 6 · answered Feb 26 '11 at 17:51

17

There's actually a data structure which does this: MultiSet.

Unfortunately, there is no MultiSet implementation in the Ruby core library or standard library, but there are a couple of implementations floating around the web.

This is a great example of how the choice of a data structure can simplify an algorithm. In fact, in this particular example, the algorithm even completely goes away. It's literally just:

Multiset.new(*names)

And that's it. Example, using https://GitHub.Com/Josh/Multimap/:

require 'multiset'

names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]

histogram = Multiset.new(*names)
# => #<Multiset: {"Jason", "Jason", "Teresa", "Judah", "Judah", "Judah", "Michelle", "Allison"}>

histogram.multiplicity('Judah')
# => 3

Example, using http://maraigue.hhiro.net/multiset/index-en.php:

require 'multiset'

names = %w[Jason Jason Teresa Judah Michelle Judah Judah Allison]

histogram = Multiset[*names]
# => #<Multiset:#2 'Jason', #1 'Teresa', #3 'Judah', #1 'Michelle', #1 'Allison'>

answered Feb 26 '11 at 17:51

Jörg W Mittag

363,080
75
446
653

Does the MultiSet concept originate from mathematics, or another programming language? – Andrew Grimm Feb 27 '11 at 22:02
2

@Andrew Grimm: Both he *word* "multiset" (de Bruijn, 1970s) and the *concept* (Dedekind 1888) originated in mathematics. `Multiset` is governed by strict mathematical rules and supports the typical set operations (union, intersection, complement, ...) in a way that is *mostly* consistent with the axioms, laws and theorems of "normal" mathematical set theory, although some important laws do *not* hold when you try to generalize them to multisets. But that's way beyond my understanding of the matter. I use them as a programming data structure, not as a mathematical concept. – Jörg W Mittag Feb 28 '11 at 03:26
To expand a *little* on that point: **"... in a way that is mostly consistent with the axioms ..."**: "Normal" sets are usually formally defined by a set of axioms (assumptions) called "Zermelo-Frankel set theory". However, one of these axioms: the [axiom of extensionality](https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory#1._Axiom_of_extensionality) states that a set is defined precisely by its members - e.g. `{A, A, B} = {A, B}`. This is clearly a violation of the very definition of multi-sets! – Tom Lord Jan 01 '18 at 23:46
...However, without going into too much detail (as this is a software forum, not advanced maths!), one *can* formally define multi-sets mathematically via axioms for Crisp sets, the Peano axioms and other MultiSet-specific axioms. – Tom Lord Jan 01 '18 at 23:49

score 14 · Answer 7 · edited Jul 24 '18 at 14:03

14

Enumberable#each_with_object saves you from returning the final hash.

names.each_with_object(Hash.new(0)) { |name, hash| hash[name] += 1 }

Returns:

=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

edited Jul 24 '18 at 14:03

Lev Lukomsky

6,346
4
34
24

answered Apr 14 '15 at 02:37

Anconia

3,888
6
36
65

Agree, `each_with_object` variant is more readable to me than `inject` – Lev Lukomsky Jul 24 '18 at 14:01

Shreyas · Answer 8 · 2011-02-27T06:29:05.283

6

This works.

arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
result = {}
arr.uniq.each{|element| result[element] = arr.count(element)}

edited Feb 27 '11 at 06:29

answered Feb 26 '11 at 16:55

Shreyas

8,737
7
44
56

2

+1 For a different approach -- although this has worse theoretical complexity -- `O(n^2)` (which will matter for some values of `n`) *and* does extra work (it has to count for "Judah" 3x, for instance)!. I would also suggest `each` instead of `map` (the map result is being discarded) – Feb 26 '11 at 18:44
Thanks for that! I've changed the map to each.Also, I've uniq'ed the array before going through it. Maybe now the complexity issue is solved? – Shreyas Feb 27 '11 at 06:29

score 6 · Answer 9 · answered Feb 27 '11 at 22:22

The following is a slightly more functional programming style:

array_with_lower_case_a = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
hash_grouped_by_name = array_with_lower_case_a.group_by {|name| name}
hash_grouped_by_name.map{|name, names| [name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]

One advantage of group_by is that you can use it to group equivalent but not exactly identical items:

another_array_with_lower_case_a = ["Jason", "jason", "Teresa", "Judah", "Michelle", "Judah Ben-Hur", "JUDAH", "Allison"]
hash_grouped_by_first_name = another_array_with_lower_case_a.group_by {|name| name.split(" ").first.capitalize}
hash_grouped_by_first_name.map{|first_name, names| [first_name, names.length]}
=> [["Jason", 2], ["Teresa", 1], ["Judah", 3], ["Michelle", 1], ["Allison", 1]]

Did I hear functional programming? +1 :-) This is definitely the best way, although it can be argued that is not memory-efficient. Notice also that Facets has a Enumerable#frequency. — tokland, Oct 24 '11 at 10:20

score 5 · Answer 10 · answered Jun 18 '13 at 20:28

5

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]
Hash[names.group_by{|i| i }.map{|k,v| [k,v.size]}]
# => {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

answered Jun 18 '13 at 20:28

Arup Rakshit

116,827
30
260
317

score 5 · Answer 11 · answered Aug 14 '14 at 12:02

5

a = [1, 2, 3, 2, 5, 6, 7, 5, 5]
a.each_with_object(Hash.new(0)) { |o, h| h[o] += 1 }

# => {1=>1, 2=>2, 3=>1, 5=>3, 6=>1, 7=>1}

Credit Frank Wambutt

answered Aug 14 '14 at 12:02

narzero

2,199
5
40
73

score 2 · Answer 12 · answered Nov 21 '17 at 15:51

2

Lots of great implementations here.

But as a beginner I would consider this the easiest to read and implement

names = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

name_frequency_hash = {}

names.each do |name|
  count = names.count(name)
  name_frequency_hash[name] = count  
end
#=> {"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

The steps we took:

we created the hash
we looped over the names array
we counted how many times each name appeared in the names array
we created a key using the name and a value using the count

It may be slightly more verbose (and performance wise you will be doing some unnecessary work with overriding keys), but in my opinion easier to read and understand for what you want to achieve

answered Nov 21 '17 at 15:51

Sami Birnbaum

773
8
20

2

I don't see how that's any easier to read than the accepted answer, and it's clearly a worse design (doing lots of unnecessary work). – Tom Lord Apr 18 '18 at 17:00
@Tom Lord - I do agree with you on performance (I even mentioned that in my answer) - but as a beginner trying to understand the actual code and steps required, I find it helps to be more verbose and then one can refactor to improve performance and make code more declarative – Sami Birnbaum May 29 '19 at 09:07
1

I agree somewhat with @SamiBirnbaum. This is the only one that uses almost no special ruby knowledge like `Hash.new(0)`. The closest to pseudocode. That can be a good thing for readability but also doing unnecessary work can harm readability for readers who notice it because in more complex cases they will spend a little time thinking they're going crazy trying to figure out why it's done. – Adamantish Jun 07 '19 at 09:50

score 2 · Answer 13 · edited Dec 03 '21 at 11:34

2

With ruby 2.6 you can do:

names.to_h{ |name| [name, names.count(name)] }

gives you:

{"Jason"=>2, "Teresa"=>1, "Judah"=>3, "Michelle"=>1, "Allison"=>1}

edited Dec 03 '21 at 11:34

Tomerikoo

18,379
16
47
61

answered Apr 25 '21 at 13:38

Akash Kinwad

704
2
7
22

score 1 · Answer 14 · edited Dec 05 '13 at 22:24

1

arr = ["Jason", "Jason", "Teresa", "Judah", "Michelle", "Judah", "Judah", "Allison"]

arr.uniq.inject({}) {|a, e| a.merge({e => arr.count(e)})}

Time elapsed 0.028 milliseconds

interestingly, stupidgeek's implementation benchmarked:

Time elapsed 0.041 milliseconds

and the winning answer:

Time elapsed 0.011 milliseconds

:)

edited Dec 05 '13 at 22:24

jszobody

28,495
6
61
72

answered Dec 05 '13 at 22:05

Alex Moore-Niemi

2,913
2
24
22

score 1 · Answer 15 · answered Feb 27 '11 at 22:09

This is more a comment than an answer, but a comment wouldn't do it justice. If you do Array = foo, you crash at least one implementation of IRB:

C:\Documents and Settings\a.grimm>irb
irb(main):001:0> Array = nil
(irb):1: warning: already initialized constant Array
=> nil
C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3177:in `rl_redisplay': undefined method `new' for nil:NilClass (NoMethodError)
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:3873:in `readline_internal_setup'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4704:in `readline_internal'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/rbreadline.rb:4727:in `readline'
        from C:/Ruby19/lib/ruby/site_ruby/1.9.1/readline.rb:40:in `readline'
        from C:/Ruby19/lib/ruby/1.9.1/irb/input-method.rb:115:in `gets'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:139:in `block (2 levels) in eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:271:in `signal_status'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:138:in `block in eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `call'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:189:in `buf_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:103:in `getc'
        from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:205:in `match_io'
        from C:/Ruby19/lib/ruby/1.9.1/irb/slex.rb:75:in `match'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:287:in `token'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:263:in `lex'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:234:in `block (2 levels) in each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `loop'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:230:in `block in each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `catch'
        from C:/Ruby19/lib/ruby/1.9.1/irb/ruby-lex.rb:229:in `each_top_level_statement'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:153:in `eval_input'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:70:in `block in start'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `catch'
        from C:/Ruby19/lib/ruby/1.9.1/irb.rb:69:in `start'
        from C:/Ruby19/bin/irb:12:in `<main>'

C:\Documents and Settings\a.grimm>

That's because Array is a class.

How to count identical string elements in a Ruby array

15 Answers15

Ruby v2.7+ (latest)

Ruby v2.4+ (currently supported, but older)

Ruby v2.2+ (deprecated)

Linked