Oh boy... there's quite a bit of stuff that's open for interpretation here. Case-sensitivity, locales, characters allowed/blacklisted... There are going to be a lot of ways to answer the general question. So, first, let me lay down a few assumptions:
- Case doesn't matter. ("Rat" is an anagram of "Tar", even with the capital lettering.)
- Locale is American English when it comes to the alphabet. (26 letters from A-Z. Compare this to Spanish, which has 28 IIRC, among which 'll' is considered a single letter and a potential consideration for Spanish anagrams!)
- Whitespace is ignored in our definition of an anagram. ("arthas menethil" is an anagram of "trash in a helmet" even though the number of whitespaces is different.)
- An empty string (null, 0-length, all white-space) has a "hash" (I prefer the term "digest", but a name is a name) of 1.
If you don't like any of those assumptions, you can modify them as you wish. Of course, that will result in the following algorithm being slightly different, but they're a good set of guidelines that will make the general algorithm relatively easy to understand and refactor if you wish.
Two strings are anagrams if they are exhaustively composed of the same set of characters and the same number of each included character. There's a lot of tools available in Java that makes this task fairly simple. We have String methods, Lists, Comparators, boxed primitives, and existing hashCode methods for... well, all of those. And we're going to use them to make our "hash" method.
private static int hashString(String s) {
if (s == null) return 0; // An empty/null string will return 0.
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
return charList.hashCode(); // See contract of java.util.List#haschCode
}
And voila; you have a method that can digest a string and produce an integer representing it, regardless of the order of the characters within. You can use this as the basis for determining whether two strings are anagrams of each other... but I wouldn't. You asked for a digest function that produces an Integer, but keep in mind that in java, an Integer is merely a 32-bit value. This method can only produce about 4.2-billion unique values, and there are a whole lot more than 4.2-billion strings you can throw at it. This method can produce collisions and give you nonsensical results. If that's a problem, you might want to consider using BigInteger instead.
private static BigInteger hashString(String s) {
BigInteger THIRTY_ONE = BigInteger.valueOf(31); // You should promote this to a class constant!
if (s == null) return BigInteger.ONE; // An empty/null string will return 1.
BigInteger r = BigInteger.ONE; // The value of r will be returned by this method
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
// Calculate our bighash, similar to how java's List interface does.
for (Character c : charList) {
int charHash = c.hashCode();
r=r.multiply(THIRTY_ONE).add(BigInteger.valueOf(charHash));
}
return r;
}