2

I've solved a problem that asks you to write a method for determining what words in a supplied array are anagrams and group the anagrams into a sub array within your output.

I've solved it using what seems to be the typical way that you would which is by sorting the words and grouping them into a hash based on their sorted characters.

When I originally started looking for a way to do this I noticed that String#sum exists which adds the ordinals of each character together.

I'd like to try and work out some way to determine an anagram based on using sum. For example "cars" and "scar" are anagrams and their sum is 425.

given an input of %w[cars scar for four creams scream racs] the expected output (which I already get using the hash solution) is: [[cars, scar, racs],[for],[four],[creams,scream]].

It seems like doing something like:

input.each_with_object(Hash.new []) do |word, hash|
  hash[word.sum] += [word]
end

is the way to go, that gives you a hash where the values of the key "425" is ['cars','racs','scar']. What I think i'm missing is moving that into the expected format of the output.

corroded
  • 21,406
  • 19
  • 83
  • 132
Caley Woods
  • 4,707
  • 4
  • 29
  • 38

4 Answers4

18

Unfortunately I don't think String#sum is a robust way to solve this problem.

Consider:

"zaa".sum # => 316
"yab".sum # => 316

Same sum, but not anagrams.

Instead, how about grouping them by the sorted order of their characters?

words = %w[cars scar for four creams scream racs]

anagrams = words.group_by { |word| word.chars.sort }.values
# => [["cars", "scar", "racs"], ["for"], ["four"], ["creams", "scream"]] 
Andy Lindeman
  • 12,087
  • 4
  • 35
  • 36
  • That seems to be the generally accepted solution and for good reason. At first blush when starting with the problem I thought that sum seemed like maybe an alternate way to attack it. My original solution isn't as eloquent as yours but it uses the same word.chars.sort idea. Just trying to stay fresh :) – Caley Woods Mar 01 '12 at 14:56
  • Also I did submit my gisted solution and it passed the specs they use in the autograder just as my original solution does. I re-submitted the original solution just so the correct implementation is on file. It's always fun to experiment. – Caley Woods Mar 01 '12 at 14:58
1

To get the desired output format, you just need hash.values. But note that just using the sum of the character codes in a word could fail on some inputs. It is possible for the sums of the character codes in two words to be the same by chance, when they are not anagrams.

If you used a different algorithm to combine the character codes, the chances of incorrectly identifying words as "anagrams" could be made much lower, but still not zero. Basically you need some kind of hash algorithm, but with the property that the order of the values being hashed doesn't matter. Perhaps map each character to a different random bitstring, and take the sum of the bitstrings for each character in the string?

That way, the chances of any two non-anagrams giving you a false positive would be approximately 2 ** bitstring_length.

Alex D
  • 29,755
  • 7
  • 80
  • 126
  • I ended up with https://gist.github.com/b1fb5aab6893da0ed933. It's a little naive as you mention but in the context of this puzzle I believe it works just as another way of going about it. – Caley Woods Mar 01 '12 at 14:39
1
words = %w[cars scar for four creams scream racs]
res={}

words.each do |word|
  key=word.split('').sort.join
  res[key] ||= []
  res[key] << word
end

p res.values


[["cars", "scar", "racs"], ["for"], ["four"],["creams", "scream"]]
Yoann Le Touche
  • 1,280
  • 9
  • 13
1

Actually, I think you could use sums for anagram testing, but not summing the chars' ordinals themselves, but something like this instead:

words = %w[cars scar for four creams scream racs]
# get the length of the longest word:
maxlen = words.map(&:length).max
# => 6 
words.group_by{|word|
  word.bytes.map{|b|
    maxlen ** (b-'a'.ord)
  }.inject(:+)
}
# => {118486616113189=>["cars", "scar", "racs"], 17005023616608=>["for"], 3673163463679584=>["four"], 118488792896821=>["creams", "scream"]} 

Not sure if this is 100% correct, but I think the logic stands.

The idea is to map every word to a N-based number, every digit position representing a different char. N is the length of the longest word in input set.

Mladen Jablanović
  • 43,461
  • 10
  • 90
  • 113
  • Testing this using Andy Lindemans example of zaa and yab below results in the correct functionality that they are not grouped together. I added you to the gist linked in my comment to Alex D. – Caley Woods Mar 01 '12 at 18:05