2

I need a checksum/fingerprint function for short strings (say, 16 to 256 bytes) which fits in a 24 bits word. Is there any well known algorithm for that?

Igor Gatis
  • 4,648
  • 10
  • 43
  • 66
  • What language/platform are you using? Why 24 bits? And what are you trying to do? – wrschneider Nov 30 '11 at 14:39
  • C++, java and python but it needs to be implementable in most popular programming languages such as javascript, c#, rubby etc. 24 bits because that's the amount of space my app have, and my app needs to generate fingerprints for short strings. Except by the nature of the input, why would those details matter? – Igor Gatis Nov 30 '11 at 16:20

2 Answers2

2

I propose to use a 24-bit CRC as an easy solution. CRCs are available in all lengths and always simple to compute. Wikipedia has a matching entry. The quality is far better than a modulo-reduced sum, because swapping characters will most likely produce a different CRC.

The next step (if it is a real threat to have a wrong string with the same checksum) would be a cryptographic MAC like CMAC. While this is too long out of the book, it can be reduced by taking the first 24 bits.

guidot
  • 5,095
  • 2
  • 25
  • 37
-1

Simplest thing to do is a basic checksum - add up the bytes in the string, mod (2^24).

You have to watch out for character set issues when converting to bytes though, so everyone agrees on the same encoding of characters to bytes.

wrschneider
  • 17,913
  • 16
  • 96
  • 176