Solved!
I'm working on a function that checks whether two strings are anagrams. The simple version converts both strings into a CharArray, sorts them and compares both arrays. This works because Anagrams have letters in the same order once sorted. For instance, god, dog are both sorted "dgo"
def isAnagram2(s : String , t : String ) : Boolean = {
if(s == null || t == null || s.length != t.length) false
val str1: Array[Char] = s.toCharArray
val str2: Array[Char] = t.toCharArray
sort(str1)
sort(str2)
equals(str1, str2)
}
The Code above compiles and works on Scala 2.10.The output is:
apple, papel: true
carrot, tarroc: true
hello, llloh: false
abba, xyzz: false
However, this is not very efficient since sorting twice takes bit for very long strings. According to this post: Comparing anagrams using prime numbers.
The fastest way of checking two strings for anagrams would be using prime numbers as hashing function.
The main idea is:
Assuming same lenght of both Strings...
1) Generating Hash using simple substitution for each character i.e. b -> 3
2) multiplying all hashvalues because primes are multiplicatively unique
3) comparing prime-hash of StringA to StringB
If both strings have the same lenght and are made out of the same characters, they should have the same prime-hash.
For example ‘cat’ and ‘act’ would like
sum_act = int(a) + int(c) sum_cat = int(c) + int(a)
so sum_act == sum_cat
Point is, this version is independent of order thus needs no sorting and has constant lookup time for each character.
In practice, I have an object PrimeHash:
object PrimeHash{
private[this] final val primeAlphabet: Map[Char, Int] = Map('a' -> 2, 'b' .., 'z' -> 101)
def hashOf(string : String): Int = {
string.trim().toLowerCase.foldLeft(1) { (hash, c) => hash * primeAlphabet(c)}
}
}
and use the hashOf function like so:
def isAnagram(s : String , t : String ) : Boolean ={
if(s == null || t == null || s.length != t.length) false
else if(PrimeHash.hashOf(s).equals(PrimeHash.hashOf(t))) true
else false
}
However, my simple test-case fails to detect non-anagrams. Here is the testcode:
def main(args: Array[String]): Unit = {
val pairs = Array(Array("apple", "papel"), Array("carrot", "tarroc"),Array("hello", "llloh"),Array("abba", "xyzz"))
for(p <- pairs){
val word1 = p(0)
val word2 = p(1)
val anagram = isAnagram2(word1, word2)
println(word1 + ", " + word2 + ": " + anagram)
}
}
The sorting function correctly detects the two "wrong" pairs but not so the hashing one.
Full code on github: https://gist.github.com/marvin-hansen/9953592
I'm not entirely sure if the hashOf function is correct
Solution: Fixed type that caused comparing hashof the same value (t) to itself. Thanks to mesutozer.