0

I would like to send some data to a device and I need to verify consistency. There will be no attacker, there can be just hardware faults.

Maximmum data size in my case will be about 256kB.

I'm interested in small footprint algorithms and also small size of hash. Something like CRC8, CRC16, CRC32 but also MD5 or SHA1 can be used. SHA2 hashes are so large for me.

Is there some general rule for practical data size limit?

j123b567
  • 3,110
  • 1
  • 23
  • 32

2 Answers2

0

No. A sha 1 hash is to all intents and purposes globally unique, and the algorithm does not break down for very large inputs. If you change a single bit, the hash should change.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • I mean something like "birthday problem" so processing 257 different files with CRC8, there is 100% probability, that there will be two of them with same CRC. On the other hand, cryptographic hashes are so slow and have so large hashes (20 bytes for sha1). So I need some golden mean. – j123b567 Oct 13 '16 at 10:17
  • I think it does not matter how strong the algorithm is. When the message (or data) string is longer than the error-check string, then as you correctly noted, there will be cases of duplicate mapping of message strings to an error check string. – ysap Nov 12 '16 at 20:06
0

You'd need to know something about the error characteristics of your channel and what would be an acceptable false positive rate for your application. How often is there an error? What is the distribution of the number of bits changed? Do you have a single bit flipped occasionally, or do you get lots of bits flipped or the whole message munged when there is an error? Are the flipped bits close to each other, i.e. do the errors occur in bursts?

In general, you would not use a cryptographic hash, since the added time spent computing it as compared to a CRC will give you no benefit. You should use a CRC or other hash, such as one in the xxhash family. They are very fast, and are as good as you can get at making false positives a low probability. The CRC has special properties that protect against a burst of errors, i.e. several adjacent or nearly adjacent bit flips.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • I don't have any numbers. I can just say, that hardware error is very unlikely. It can be just bit flipped, but in some rare cases, it can be complete garbage. I didn't know the xxhash, it sounds interresting. – j123b567 Oct 13 '16 at 10:08
  • By xxHash, you have pointed me to [SMHasher](https://github.com/aappleby/smhasher) tool that is something what I need. – j123b567 Oct 13 '16 at 12:57