0

All input will be lowercase English alphabet.

HashString("ab")= should be unique value
HashString("ba")= should give me the same value as above

I tried with assigning each alphabet with a number, but it turned out to be wrong logic

My attempt produced following output.

HashString("ab")=3
HashString("ba")=3 this is correct.
HashString("c")=3  this is wrong.
Mohanraja
  • 186
  • 1
  • 1
  • 11
  • 1
    What is the *longest string* which can be input? – Dmitry Bychenko Jul 12 '17 at 11:14
  • 1
    You should clarify what the requirements are. Your question only mentions the constraint that the input is lowercase-alpha, that the function should be order-invariant and that "ba" and "c" shouldn't collide, which is fairly little information. –  Jul 12 '17 at 11:20
  • Why don't you sort the characters in the input string, and then use any standard hashing algorithm? – m69's been on strike for years Jul 12 '17 at 11:23
  • Anything commutative? `xor(hash(x), hash(reverse(x)))` – Alex K. Jul 12 '17 at 11:27
  • Possible dupe: https://stackoverflow.com/questions/30734848/order-independant-hash-algorithm –  Jul 12 '17 at 11:28
  • why not just `md5(sort(s))`? – fafl Jul 12 '17 at 11:37
  • I am working on the following problem [link](https://www.hackerrank.com/challenges/sherlock-and-anagrams/problem), the problem asks for given string length n, find count of all possible substring which have same frequency of alphabets (aabb, abab) are of same frequency, thats why I wanted to hash all substring and then I can easily find out which are same frequency, due to time complexity I can't afford to sort the string – Mohanraja Jul 12 '17 at 12:16
  • I don't think generating all possible strings and then comparing them (or their hash) is ever going to be fast enough. Use a mathematical approach instead. The number of unique permutations depends on the number of duplicate letters: abcd = 24 ; aabc = 12 ; aabb = 6 ; aaab = 4 ; aaaa = 1... – m69's been on strike for years Jul 12 '17 at 13:38

3 Answers3

2

The first thing that comes to mind in the vein of the attempt in the question is to assign every letter a prime number, and multiply them. Then, "ab" is 2*3 = 6; "ba" is 3*2 = 6; "c" is 5.

John C
  • 1,931
  • 1
  • 22
  • 34
  • 1
    One should note that, while this is a good answer in theory (it is collision free if your integers are arbitrarily large), it is of disadvantage in practical applications if you hash values have many (small) divisors. Hashmaps usually use the hash values mod n (for some n that can vary). Having many divisors can lead to many collisions mod n. – J Fabian Meier Jul 12 '17 at 11:39
1

No, because there are infinitely many possible Strings, but there is only a finite number of possible Hash values.

You cannot have collision free Hash functions on Strings, but you can design your function to have as little collision as possible for the expected input values.

J Fabian Meier
  • 33,516
  • 10
  • 64
  • 142
0

As others have mentioned, you can't ensure that every string with different letters produces a different hash, because there are only 2^32 (or 2^64) different hashes available, and way more different combinations of letters than that.

But if you just want to make a hash function that doesn't care about the order of the characters in the string, then the simplest thing to do is sort the characters in the string (so "canada" would become "aaacdn", for example), and then hash the result.

Another common way is to map each character to a random-looking number, and then just add together the numbers for all the characters.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87