2

I am experiencing a pretty annoying issue, using a crc16 hash to manage some of my informations.

In my application, I pass some information into an url parameter, a huge encoded context. That context allow the users to recover their old searches. In that context, i have some elements I hash to be sure it won't take too much characters.

It seems that some elements return the same hash (crc16 algorithm).

I take the has and transform it to a string : crc.ToString("X4"); For example, two different elements gives me : 5A8E.

I tried to use a crc32, but if I do that, the old context won't be recognize.

Do you have any idea how i can find a solution to that ? Thanks a lot

Loot
  • 71
  • 8

2 Answers2

8

Even if CRC16 was an ideal hash function (which it's not), with just 16 bits, the Birthday Paradox means that there's around a 50% chance of a hash collision in a set of just 2^8 = 256 items. You almost certainly need more bits.

You can't keep the old hashes working and make them distinguish existing collisions -- that's a contradiction. But you can implement a new, better hashing scheme, add a flag to the URL parameters to indicate that you're using this new scheme, make sure that all your pages generate only these new-style URLs, and "grandfather in" the old-style URLs (which will continue to produce the same collisions as before). I'd suggest giving users a big, bright message to update their bookmarks, and auto-redirecting the page, whenever you get an old-style URL.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
0

To explain the other solution i thought about and I am implementing.

I just prepare a crc32 and a crc16 hash for my elements. I use the 32 for the new urls i now build, but use the crc16 hash as a fallback for old urls.

So, when I try to compare the Hash, i start with the new Hash, and, if i can't find any element, i go to my fallback and compare it with the crc16 hash.

This allow me to get any case.

Loot
  • 71
  • 8
  • Is the new hash longer? (If it's the same size, you'll just introduce more collisions for everyone -- old and new.) – j_random_hacker Oct 21 '15 at 14:44
  • Yes longer, new one is based on 8 characters :) And it currently works like a charm, waiting for the test team to finish their crazy tests :) – Loot Oct 21 '15 at 15:51
  • 1
    It still sounds like you're trying to use a hash to uniquely identify something. If the consequence of a hash collision is catastrophic, you probably shouldn't be using hashing - and you _definitely_ shouldn't be using short, non-cryptographically-secure hashes. – Nick Johnson Oct 21 '15 at 15:56