Fast and robust checksum algo for a small data file (~10KB)

Question

I have a data file that needs to be pushed to an embedded device. The typical size for the file ranges from a few bytes to about 10K max. My intention is to detect tampering with the contents of this file(chksum to be last element in the data file). The data range is a mix of strings, signed and unsigned integers. I am looking for a robust algo to avoid a lot of collisions as well something that does not use a lot of cycles to compute. I am considering Fletcher16(), CRC-32 and the solution discussed in this post

Any suggestions for a simple algo for my kind of data size/contents?

Thanks in advance!

EDIT:- Thanks everyone for the insightful answers and suggestions.

Some background: This is not a hyper secure data file. I just want to be able to detect whether someone wrote it by mistake. The file gets generated by a module and should be just read only by the SW. Recently there has been a few instances where folks have pulled it from the target file system, edited and pushed back to the target hoping that would fix their problems. (Which btw it would if edited carefully). But this defeats the very purpose of auto generatating this file and the existence of this module. I would like to detect and such playful "hacks" and abort gracefully.

Rather than a checksum, you should use a cryptographically secure hash function if you want to protect against tampering. — Juan Carlos Ramirez, Mar 06 '19 at 22:49
Which is more important: "detect tampering" or "does not use a lot of cycles to compute" or _both_ (meaning you'll get neither). — chux - Reinstate Monica, Mar 07 '19 at 00:12
@chux: both I guess :) I have updated the question to give some background on where this is coming from.. — Zakir, Mar 07 '19 at 02:10

score 4 · Accepted Answer · answered Mar 07 '19 at 00:17

My intention is to detect tampering with the contents of this file

If you need to detect intentional tampering with a file, you need some sort of cryptographic signature -- not just a hash.

If you can protect a key within the device, using HMAC as a signature algorithm may be sufficient. However, if the secret is extracted from the device, users will be able to use this to forge signatures.

If you cannot protect a key within the device, you will need to use an asymmetric signature algorithm. Libsodium's crypto_sign APIs provide a nice API for this. Alternatively, if you want to use the underlying algorithms directly, EdDSA is a decent choice.

Either of these options will require a relatively large amount of space (32 to 64 bytes) to be allocated for a signature, and verifying that signature will take significantly more time than a noncryptographic signature. This is largely unavoidable if you need to effectively prevent tampering.

chqrlie · Answer 2 · 2019-03-07T07:22:54.427

For your purpose, you can use a cryptographic hash such as SHA256. It is very reliable and collisions are abysmally unlikely but you should test if the speed is OK.

There is a sample implementation in this response: https://stackoverflow.com/a/55033209/4593267

To detect intentional tampering with the data, you can add a secret key to the hashed data. The device will need to have a copy of the secret key, so it is not a very secure method as the key could be extracted from the device through reverse engineering or other methods. If the device is well protected against that, for example if it is inside a secure location, a secure chip or in a very remote location such as a satellite in space and you are confident there are no flaws providing remote access, this may be sufficient.

Otherwise an asymmetrical cryptographic system is required, with a private key known only to the legitimate source(s) of those data files, and a public key used by the device to verify the cryptographic hash, as documented in duskwuff's answer.

SHA256 doesn't prevent tampering, though… – Mar 06 '19 at 23:14 — , Mar 06 '19 at 23:14
@duskwuff: Do you have any suggestions for my use case? – Zakir Mar 06 '19 at 23:25 — Zakir, Mar 06 '19 at 23:25

score 1 · Answer 3 · answered Mar 11 '19 at 14:18

If you're only concerned about accidental or non-malicious tampering, a CRC should be sufficient.

(I'm using a somewhat circular definition of 'malicious' here: if somebody goes to the trouble of recalculating or manipulating the CRC to get their edits to work, that counts as 'malicious' and we don't defend against it.)

Fast and robust checksum algo for a small data file (~10KB)

3 Answers3