0

I read a paper that explained that using CRCs generated from the CRC-64-ISO algorithm as hash keys is likely to result in collisions for large sets of data. Postmodern's Ruby CRC project is pretty interesting, but the CRC64 class seems to be using the CRC-64-ISO algorithm.

I'm hoping to generate probably-unique ids from canonical input that are stable and somewhat human-friendly, e.g., that would be easy to use in a spreadsheet being maintained by hand. I would just use SHA1s, but they're pretty long.

I'm only familiar with the basics of hash keys, and I barely caught CRC-64-ISO issue and don't feel competent at this point to put together a class with better hashing characteristics. Is there an existing ruby library that has something that can be used here?

Eric Walker
  • 7,063
  • 3
  • 35
  • 38

2 Answers2

1

CRC's are designed for error checking, not for hashtable lookups. You should be using Spooky (Bob Jenkins), Google's CityHash or TMMHv2 for such purposes. Using cryptographic hashes like MD5 will work, but is pretty slow.

0

you can calculate the md5/sha1 and just truncate the output value...

rogerdpack
  • 62,887
  • 36
  • 269
  • 388