0

I have an array, containing n amount of elements. Each element contains two words.

This makes the array look like this: ['England John', 'England Ben', 'USA Paul', 'England John']

I want to find the number of unique names for each country. For example, England would have 2 unique names as John exists two times.

So far I have split the array into two arrays, one containing the countries such as ['England', 'Usa', ...] and the other containing names ['John', 'Paul', ...], however I'm unsure of where to go from here

iGian
  • 11,023
  • 3
  • 21
  • 36
Ben Williams
  • 91
  • 1
  • 1
  • 5
  • Sounds like you want to use some kind of multi-map structure rather than an array, where the Country is the key and the names the values – Tom Jun 04 '19 at 14:52

4 Answers4

5

One liner option:

ary.uniq.group_by { |e| e.split.first }.transform_values(&:count)
#=> {"England"=>2, "USA"=>1}
iGian
  • 11,023
  • 3
  • 21
  • 36
  • Note that `transform_values` is an ActiveSupport method. If you're not using rails, you can replace it with `map { |country, occurrences| [country, occurences.count] }.to_h`. – ndnenkov Jun 04 '19 at 15:31
  • @ndnenkov, `transform_values` is pure Ruby: https://ruby-doc.org/core-2.6.3/Hash.html#method-i-transform_values, but for older Ruby your option is ok. – iGian Jun 04 '19 at 16:10
  • ndnenkov and @pascalbetz, [Hash#transform_values](http://ruby-doc.org/core-2.5.1/Hash.html#method-i-transform_values) (and `transform_values!`, `transform_keys` and `transform_keys!`) made it's debut in Ruby v2.4. – Cary Swoveland Jun 04 '19 at 16:10
  • @iGian Thank you for your reply. Please could you explain what the different parts do? I'm especially confused by the ```{ |e| e.split.first }``` bit, and the ```transform_values(&:count)``` bit – Ben Williams Jun 05 '19 at 12:34
  • @BenWilliams I'd suggest you run the code step by step: `ary.uniq`, then `ary.uniq.group_by { |e| e.split.first }` and so on. `|e|` is the variable that contains the string which is splitted into an array of which we take the first element: `"a b".split.first` returns `"a"`, which is the value used for grouping. For `transform_values`, please see the comments above. – iGian Jun 05 '19 at 13:57
3

The problem, really, is that you're storing this data as an array of strings. This is a poor choice of data structure, as it makes manipulation much harder.

Suppose, for example, we first convert this data into a Hash, which maps each country to the list of names:

data = ['England John', 'England Ben', 'USA Paul', 'England John']

mapped_names = {}

data.each do |item|
  country, name = item.split
  mapped_names[country] ||= []
  mapped_names[country] << name
end

Now, obtaining the count is quite easy:

mapped_name_counts = unique_names.transform_values { |names| names.uniq.count }

The resulting variables are:

mapped_names # => {"England"=>["John", "Ben", "John"], "USA"=>["Paul"]}
mapped_name_counts # => {"England"=>2, "USA"=>1}

And if using ruby version 2.7 (not yet released!!), that last line of code could even be simplified to:

mapped_name_counts = unique_names.tally(&:uniq)
Tom Lord
  • 27,404
  • 4
  • 50
  • 77
  • It looks like you switched from the name `unique_names` to `mapped_names`, but didn't change all occurrences of the former. No reply needed. I'll delete this comment when you've seen it. – Cary Swoveland Jun 04 '19 at 16:47
  • It looks like [Enumerable#tally](https://blog.saeloun.com/2019/03/03/ruby-2-7-enumerable-tally.html) will be quite useful. – Cary Swoveland Jun 04 '19 at 16:51
  • @CarySwoveland [Yep, I agree](https://stackoverflow.com/a/48053739/1954610) – Tom Lord Jun 04 '19 at 17:39
1
arr = ['England John', 'England Ben', 'USA Paul', 'England John']

arr.uniq.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 }
  #=> {"England"=>2, "USA"=>1}

This requires two passes through the array (arr.uniq being the first). To make only a single pass one could do the following.

require 'set'

uniques = Set.new
arr.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 if uniques.add?(s) }
  #=> {"England"=>2, "USA"=>1}

See the form of Hash::new that takes an argument (called the default value), and also Set#add?.

It's not clear to me which of the two calculations would generally be faster.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
0

A bit more verbose than the other solutions but does not use transform_valuesfrom ActiveSupport.

require "set"

data = ["England John", "England Ben", "USA Paul", "England John", "Switzerland Pascal"]

names_per_country = data.each_with_object({}) do |country_and_name, accu|
  country, name = country_and_name.split(" ")
  country_data = accu[country] ||= Set.new
  country_data << name
end

names_per_country.each do |country, names|
  puts "#{country} has #{names.size} unique name(s)"
end

# => England has 2 unique names
# => USA has 1 unique names
# => Switzerland has 1 unique names

This solution first transforms the array to a Hash structure, where the key is the country name and the value is a Set. I've chosen Set because it does take care of the unique part of your question automatically (a Set can not contain duplicates).

After that you can find the number of unique names per country by checking the size of the Set. You can also find the names (the elements of the Set if required)

Pascal
  • 8,464
  • 1
  • 20
  • 31