Scala: Anagram of two strings using prime number

Question

Solved!

I'm working on a function that checks whether two strings are anagrams. The simple version converts both strings into a CharArray, sorts them and compares both arrays. This works because Anagrams have letters in the same order once sorted. For instance, god, dog are both sorted "dgo"

def  isAnagram2(s : String , t : String ) : Boolean = {
  if(s == null || t == null || s.length != t.length) false

 val str1: Array[Char] = s.toCharArray
 val str2: Array[Char] = t.toCharArray
 sort(str1)
 sort(str2)
  equals(str1, str2)

}

The Code above compiles and works on Scala 2.10.The output is:

 apple, papel: true
 carrot, tarroc: true
 hello, llloh: false
 abba, xyzz: false

However, this is not very efficient since sorting twice takes bit for very long strings. According to this post: Comparing anagrams using prime numbers.

The fastest way of checking two strings for anagrams would be using prime numbers as hashing function.

The main idea is:

Assuming same lenght of both Strings...

1) Generating Hash using simple substitution for each character i.e. b -> 3

2) multiplying all hashvalues because primes are multiplicatively unique

3) comparing prime-hash of StringA to StringB

If both strings have the same lenght and are made out of the same characters, they should have the same prime-hash.

For example ‘cat’ and ‘act’ would like

sum_act = int(a) + int(c) sum_cat = int(c) + int(a)

so sum_act == sum_cat

Point is, this version is independent of order thus needs no sorting and has constant lookup time for each character.

In practice, I have an object PrimeHash:

object PrimeHash{
private[this] final val primeAlphabet: Map[Char, Int] = Map('a' -> 2, 'b'  .., 'z' -> 101)

def hashOf(string : String): Int = {
  string.trim().toLowerCase.foldLeft(1) { (hash, c) => hash *  primeAlphabet(c)}
  }
}

and use the hashOf function like so:

 def isAnagram(s : String , t : String ) : Boolean ={
   if(s == null || t == null || s.length != t.length) false
    else if(PrimeHash.hashOf(s).equals(PrimeHash.hashOf(t))) true
  else false
}

However, my simple test-case fails to detect non-anagrams. Here is the testcode:

 def main(args: Array[String]): Unit = {

val pairs = Array(Array("apple", "papel"), Array("carrot", "tarroc"),Array("hello", "llloh"),Array("abba", "xyzz"))

for(p <- pairs){
  val word1 = p(0)
  val word2 = p(1)
  val anagram = isAnagram2(word1, word2)
  println(word1 + ", " + word2 + ": " + anagram)
}
}

The sorting function correctly detects the two "wrong" pairs but not so the hashing one.

Full code on github: https://gist.github.com/marvin-hansen/9953592

I'm not entirely sure if the hashOf function is correct

Solution: Fixed type that caused comparing hashof the same value (t) to itself. Thanks to mesutozer.

score 3 · Accepted Answer · answered Apr 03 '14 at 12:55

3

You have a typo: comparing hashof the same value (t)

else if(PrimeHash.hashOf(t).equals(PrimeHash.hashOf(t))) true

answered Apr 03 '14 at 12:55

mesutozer

2,839
1
12
13

Thanks your right, double checked it again it works now! – Marvin.Hansen Apr 03 '14 at 12:57
worked for me for other test strings.. let me try with others – mesutozer Apr 03 '14 at 13:00
Do you think, This will work on a large dataset. As 8+2=10 and 5+5=10. So for large data set may be two equal strings will have same length and sum but different character – Somnath Sarode Feb 26 '18 at 13:19

Scala: Anagram of two strings using prime number

1 Answers1