0

Given a string how to create a unique identifier / hash for that string so that one can track the occurence of the string without actually logging the original string.

For example a URL "www.mylittlesecret.com" should show up as "xyz123" (hascode for that string). So that the url always translates into xyz123 but from xyz123 one can't determine the url.

Sorry if those are the wrong terms. I am happy to read more about "hashing" if somebody could provide me the right keywords.

Cilvic
  • 3,417
  • 2
  • 33
  • 57
  • Possible dup of: [Creating your own TinyURL](http://stackoverflow.com/q/1075409/338665). (it has these questions answered at least)... – ircmaxell Feb 23 '11 at 21:46
  • 1
    One important question is whether it is a requirement that two strings can't result in the same hash. This is called a "collision". If this is a requirement, you're probably looking at longer hashes than "xyz123". If a lack of collisions is a requirement, look into "cryptographic hash function", particularly the SHA-2 family of functions. – Jacob Mattison Feb 23 '11 at 21:49
  • Thank you, I can't really say whether it's a requirement. I wouldn't expect collisions to be a problem because I only have hundreds of strings per user and the effect of a collision wouldn't be dramatic. But I'll have a look at SHA-2 anyway maybe it doesn't hurt to use. – Cilvic Feb 24 '11 at 06:56

1 Answers1

3

If you use a hash algorithm like SHA1 you will get the desired behavior. You will not be able to reconstruct your URL from the hash, but you can compare the hashes and see if the URLs are the same or not.

But if someone wants to find out what URLs you have you will be subjected to dictionary like attacks, where a users simply takes a list of all known web sites and sees if the hashes matches. So that might be something to watch out for.

Anders Zommarin
  • 7,094
  • 2
  • 25
  • 24