What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

Question

I want to write a function that takes string as a parameter and returns a number corresponding to that string.

Integer hashfunction(String a)
{    
    //logic    
}

Actually the question im solving is as follows :

Given an array of strings, return all groups of strings that are anagrams. Represent a group by a list of integers representing the index in the original list.

Input : cat dog god tca

Output : [[1, 4], [2, 3]]

Here is my implementation :-

public class Solution {
    Integer hashfunction(String a)
    {
        int i=0;int ans=0;
        for(i=0;i<a.length();i++)
        {
            ans+=(int)(a.charAt(i));//Adding all ASCII values    
        }

        return new Integer(ans);
    }
    **Obviously this approach is incorrect**
    public ArrayList<ArrayList<Integer>> anagrams(final List<String> a) {
        int i=0;
        HashMap<String,Integer> hashtable=new HashMap<String,Integer>();
        ArrayList<Integer> mylist=new ArrayList<Integer>();
        ArrayList<ArrayList<Integer>> answer=new  ArrayList<ArrayList<Integer>>(); 
        if(a.size()==1)
        {
            mylist.add(new Integer(1));
            answer.add(mylist);
            return answer;
        }

        int j=1;
        for(i=0;i<a.size()-1;i++)
        {   

            hashtable.put(a.get(i),hashfunction(a.get(i)));
            for(j=i+1;j<a.size();j++)
            {

                if(hashtable.containsValue(hashfunction(a.get(j))))
                {
                    mylist.add(new Integer(i+1));
                    mylist.add(new Integer(j+1));
                    answer.add(mylist);
                    mylist.clear();


                }
            }
        }
        return answer;
    }
}

Here is a hint `Output : [[1, 4], [2, 3]]` is not an Integer — Scary Wombat, Jun 16 '16 at 01:03

score 0 · Answer 1 · answered Jun 16 '16 at 01:13

You need a number that is the same for all strings made up of the same characters.

The String.hashCode method returns a number that is the same for all strings made up of the same characters in the same order.

If you can sort all words consistently (for example: alphabetically) then String.hashCode will return the same number for all anagrams.

return String.valueOf(Arrays.sort(inputString.toCharArray())).hashCode();

Note: this will work for all words that are anagrams (no false negatives) but it may not work for all words that are not anagrams (possibly false positives). This is highly unlikely for short words, but once you get to words that are hundreds of characters long, you will start encountering more than one set of anagrams with the same hash code.

Also note: this gives you the answer to the (title of the) question, but it isn't enough for the question you're solving. You need to figure out how to relate this number to an index in your original list.

score 0 · Accepted Answer · answered Jun 16 '16 at 01:57

Oh boy... there's quite a bit of stuff that's open for interpretation here. Case-sensitivity, locales, characters allowed/blacklisted... There are going to be a lot of ways to answer the general question. So, first, let me lay down a few assumptions:

Case doesn't matter. ("Rat" is an anagram of "Tar", even with the capital lettering.)
Locale is American English when it comes to the alphabet. (26 letters from A-Z. Compare this to Spanish, which has 28 IIRC, among which 'll' is considered a single letter and a potential consideration for Spanish anagrams!)
Whitespace is ignored in our definition of an anagram. ("arthas menethil" is an anagram of "trash in a helmet" even though the number of whitespaces is different.)
An empty string (null, 0-length, all white-space) has a "hash" (I prefer the term "digest", but a name is a name) of 1.

If you don't like any of those assumptions, you can modify them as you wish. Of course, that will result in the following algorithm being slightly different, but they're a good set of guidelines that will make the general algorithm relatively easy to understand and refactor if you wish.

Two strings are anagrams if they are exhaustively composed of the same set of characters and the same number of each included character. There's a lot of tools available in Java that makes this task fairly simple. We have String methods, Lists, Comparators, boxed primitives, and existing hashCode methods for... well, all of those. And we're going to use them to make our "hash" method.

private static int hashString(String s) {
    if (s == null) return 0; // An empty/null string will return 0.

    List<Character> charList = new ArrayList<>(); 
    String lowercase = s.toLowerCase(); // This gets us around case sensitivity

    for (int i = 0; i < lowercase.length(); i++) {
        Character c = Character.valueOf(lowercase.charAt(i));
        if (Character.isWhitespace(c)) continue; // spaces don't count
        charList.add(c); // Note the character for future processing...
    }

    // Now we have a list of Characters... Sort it!
    Collections.sort(charList);
    return charList.hashCode(); // See contract of java.util.List#haschCode
}

And voila; you have a method that can digest a string and produce an integer representing it, regardless of the order of the characters within. You can use this as the basis for determining whether two strings are anagrams of each other... but I wouldn't. You asked for a digest function that produces an Integer, but keep in mind that in java, an Integer is merely a 32-bit value. This method can only produce about 4.2-billion unique values, and there are a whole lot more than 4.2-billion strings you can throw at it. This method can produce collisions and give you nonsensical results. If that's a problem, you might want to consider using BigInteger instead.

private static BigInteger hashString(String s) {
    BigInteger THIRTY_ONE = BigInteger.valueOf(31); // You should promote this to a class constant!

    if (s == null) return BigInteger.ONE; // An empty/null string will return 1.

    BigInteger r = BigInteger.ONE; // The value of r will be returned by this method
    List<Character> charList = new ArrayList<>(); 
    String lowercase = s.toLowerCase(); // This gets us around case sensitivity

    for (int i = 0; i < lowercase.length(); i++) {
        Character c = Character.valueOf(lowercase.charAt(i));
        if (Character.isWhitespace(c)) continue; // spaces don't count
        charList.add(c); // Note the character for future processing...
    }

    // Now we have a list of Characters... Sort it!
    Collections.sort(charList);

    // Calculate our bighash, similar to how java's List interface does.
    for (Character c : charList) {
        int charHash = c.hashCode();
        r=r.multiply(THIRTY_ONE).add(BigInteger.valueOf(charHash));
    }

    return r;
}

What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

2 Answers2