2

What is the quickest and best performance way to turn a string into a string of numbers and/or letters, then reverse engineer to the original string. Similar to turning a string into it's hash code, but that's a one-way conversion. I need a two-way method. I'm creating a simple url shorting service and I don't want to deal with a database.

I considered MD5 encrypting/decrypting via a private key, but I imagine there's another way that might be better on performance.

If encrypting/decrypting is the way to go, then which is the easiest on the processor?

Thanks!

Levitikon
  • 7,749
  • 9
  • 56
  • 74
  • 5
    Google "pigeonhole principle" to find out why un-hashing isn't feasible (and why you can't arbitrarily and losslessly shorten a string via other means). – cHao Mar 23 '12 at 23:19
  • 1
    Even if it worked, you would probably only be able to reduce the number of strings by a relatively small margin (nowhere near the common URL shorteners). Also, MD5 is not an encryption and does not use any private or public keys. Unless you have a magically compressing encryption algorithm, encrypting a URL will most likely never give you a URL that's significantly shorter (if at all) than the original URL. You should probably use a database. :) – hangy Mar 23 '12 at 23:25

2 Answers2

11

When you encrypt you don't shorten anything. The cipher text is roughly the same length as the clear text. However, if you use a cryptographic hash you shorten the string to the length of hash. The drawback is that you are no longer able to reverse the hash back to the original string. I don't think you will be able to create a URL shortener using neither an encryption algorithm nor a cryptographic hash function. If that was possible to you could achieve infinite or at least very high compression of information.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
6

What you'd need is a lossless compression algorithm. I mean, that's the only way you'd be able to compress the text, and also be able to decompress it.

With compression the string length is going to vary, of course. On average the length isn't going to be as small as the Base64 ID's that tinyurl uses. They're able to do that because they're storing the ID and URL in a database.

Nevertheless, here are a couple options...

If you go this route I'd create a little console application to test the performance of all of this. If you have the RAM, it might even be worth it to cache the results into a Dictionary so you're not having to constantly compress and decompress URLs for every request.

UPDATE

There's a library called smaz which is apparently designed for compressing small strings...

It can compress URLS pretty well:

'http://google.com' compressed by 59%

'http://programming.reddit.com' compressed by 52%

'http://github.com/antirez/smaz/tree/master' compressed by 46%

Community
  • 1
  • 1
Steve Wortham
  • 21,740
  • 5
  • 68
  • 90