Fast anagram solving

Question

Given two strings, I would like to determine whether or not they are anagrams of one another. Here is the solution that I came up with:

# output messages
def anagram
    puts "Anagram!"
    exit 
end

def not_anagram
    puts "Not an anagram!"
    exit
end

# main method
if __FILE__ == $0
   # read two strings from the command line
   first, second = gets.chomp, gets.chomp

   # special case 1
   not_anagram if first.length != second.length

   # special case 2
   anagram if first == second

   # general case
   # Two strings must have the exact same number of characters in the
   # correct case to be anagrams.
   # We can sort both strings and compare the results
   if first.chars.sort.join == second.chars.sort.join
      anagram
   else
      not_anagram
   end
end

But I am thinking that there is probably a better one. I analyzed the efficiency of this solution, and came up with:

chars: splits a string into an array of characters O(n)
sort: sorts a string alphabetically, I don't know how sort is implemented in Ruby but I assumed O(n log n) since that is the generally best known sorting efficiency
join: builds a string from an array of characters O(n)
==: The string comparison itself will have to examine every character of the strings 2*O(n)

Given the above, I categorized the efficiency of the entire solution as O(n log n) since sorting had the highest efficiency. Is there a better way to do this that is more efficient than O(n log n)?

score 6 · Accepted Answer · answered Mar 12 '13 at 23:11

6

Your big O should be O(n*lg(n)) since the sort is the limiting function. If you try it with very big anagrams you will see a loss of performance higher than expected for an O(n) solution.

You can do an O(n) solution by comparing counts in two maps of characters => character counts.

There are definitely other solutions that work with approximately the same complexity but I don't think you can come up with anything faster than O(n)

answered Mar 12 '13 at 23:11

Daniel Williams

8,673
4
36
47

Sorry I meant to put `n log n` I have that in my notes, I just copied the wrong formula into the question. – Hunter McMillen Mar 12 '13 at 23:14
+1. Strictly speaking, the count solution is O(max(n,|alphabet|)). The size of the alphabet is technically constant, but if the alphabet is unicode and the strings are unhuge, it will dominate. – rici Mar 12 '13 at 23:15
The count solution being, iterate through both strings hash each character and update its count as you go along? I kind of figured that this problem would always be `O(n)` since the comparison is always bounding by the entire string length. – Hunter McMillen Mar 12 '13 at 23:15
Yes. Store the counts for each character in different maps and then compare the maps at the end. – Daniel Williams Mar 12 '13 at 23:17
@DanielWilliams Thanks, this was really helpful. I also read about a solution that counts the byte values for each character in the strings, but discarded it due to the possibility of collisions. – Hunter McMillen Mar 12 '13 at 23:18
Omega(n) for the general case is clear since you can't possibly decide for two different strings whether they contain the same characters without checking all characters. – G. Bach Mar 13 '13 at 01:04
@HunterMcMillen: Before you start, you have to clear the count array and at the end you have to scan it to make sure that all of its elements are 0. So the size of the count array is also a factor in the time computation. – rici Mar 13 '13 at 02:01
@rici I am not sure what you mean, what count array? – Hunter McMillen Mar 13 '13 at 03:16
@rici you'd use a Hash, not an Array, so the *time* does not depend on alphabet size, only on string length. *space* on the other hand would depend on alphabet size for counting characters – AJcodez Mar 13 '13 at 03:50

score 3 · Answer 2 · answered Mar 12 '13 at 23:43

3

Example of counting:

def anagram?(str_a, str_b)
  if str_a.length != str_b.length
    false
  else
    counts = Hash.new(0)
    str_a.each_char{ |c| counts[c] += 1 }
    str_b.chars.none?{ |c| (counts[c] -= 1) < 0 }
  end
end

anagram? 'care', 'race'
# => true
anagram? 'cat', 'dog'
# => false

answered Mar 12 '13 at 23:43

AJcodez

31,780
20
84
118

`anagram? 'cats', 'cat'` will return true. – Juan Lopes Mar 13 '13 at 01:27
1

@JuanLopes 'cats', 'cat' wont reach the `else` block because they are different lengths. So it will in fact return `false` – Hunter McMillen Mar 13 '13 at 03:12

banarun · Answer 3 · 2013-07-05T08:12:27.617

3

You can do it in O(n+m) where m is length of alphabet

1.Create an array of size equal to the size of your input alphabet.

2.Initialize all the values in the array to '0'.

3.Scan the first input string, increment the corresponding value in the array for each character (like increment array[0] if first letter in the alphabet it found).

4.Repeat the same for the second string, except in this case the value in the array need to be decremented.

If all the values in the array are '0's then the two strings are anagrams, else they are not.

edited Jul 05 '13 at 08:12

answered Mar 13 '13 at 11:40

banarun

2,305
2
23
40

@funtime `n` is the length of the two strings, so this algorithm is actually `O(m)` where `m` is the length of the alphabet used, but still a good solution. – Hunter McMillen Mar 13 '13 at 15:34

score 1 · Answer 4 · answered Mar 13 '13 at 06:02

I needed something to check anagrams, and came up with this:

def string_to_array(s)
  s.downcase.gsub(/[^a-z]+/, '').split('').sort
end

def is_anagram?(s1, s2)
  string_to_array(s1) == string_to_array(s2)
end

puts is_anagram?("Arrigo Boito",       "Tobia Gorrio")
puts is_anagram?("Edward Gorey",       "Ogdred Weary")
puts is_anagram?("Ogdred Weary",       "Regera Dowdy")
puts is_anagram?("Regera Dowdy",       "E. G. Deadworry")
puts is_anagram?("Vladimir Nabokov",   "Vivian Darkbloom")
puts is_anagram?("Vivian Darkbloom",   "Vivian Bloodmark")
puts is_anagram?("Dave Barry",         "Ray Adverb")
puts is_anagram?("Glen Duncan",        "Declan Gunn")
puts is_anagram?("Damon Albarn",       "Dan Abnormal")
puts is_anagram?("Tom Cruise",         "So I'm cuter")
puts is_anagram?("Tom Marvolo Riddle", "I am Lord Voldemort")
puts is_anagram?("Torchwood",          "Doctor Who")
puts is_anagram?("Hamlet",             "Amleth")
puts is_anagram?("Rocket boys",        "October Sky")
puts is_anagram?("Imogen Heap",        "iMegaphone")

Thanks for your solution, unfortunately for this problem I anagrams were case sensitive and whitespace was allowed. So `"the eyes"` is not an anagram of `"theeyes"` in my case. — Hunter McMillen, Mar 13 '13 at 15:33

Fast anagram solving

4 Answers4