5

I am saving some Objects I have defined from my own classes, to File. (saving the stream data).

That is all fine, but I would like to be able to store in the File the CRC checksum of that File.

Then, whenever my Application attemps to Open a File, it can read the internally stored CRC value.

Then perform a check on the actual File, if the CRC of the File matches the internally stored CRC value I can process the File normally, otherwise display an error message to say the File is not valid.

I need some advice on how to do this though, I thought I could do something like this:

  • Save the File from my Application.
  • Calculate the CRC of the Saved File.
  • Edit the Saved File storing the CRC Value.
  • Whenever a File is Opened, Check the CRC matches internal CRC Value.

Problem is, as soon as a single Byte of Data is altered in the File, results in the CRC checksum being completely different - as expected.

menjaraz
  • 7,551
  • 4
  • 41
  • 81
  • 1
    It might have been obvious to you Warren, but I am still very much learning Delphi and programming in general. I struggle with the logic and things most of the time, normally by confusing myself with problem situations. I think I will change my picture now, so you don't have to put me down again. @David thanks for your supportive message :) –  Dec 22 '11 at 19:05
  • 1
    Apologies, Craig. Sorry. – Warren P Dec 22 '11 at 20:47
  • @Craig, are you using the CRC32 only for error checking and not to prevent tampering? – Marcus Adams Dec 22 '11 at 21:10
  • thanks Warren, If I could solve problems in my mind better I would do so much more better, but I really struggle with problem solving :( @Marcus I just want a way to verify the File is valid and was saved from my Application. So I guess checking it is why I thought of the CRC. –  Dec 22 '11 at 23:18

4 Answers4

11

I'd generally prefer the approach where the CRC is excluded from the checking. But if that's not possible for some reason, there is a workaround:

You need to reserve 8 bytes, 4 for the CRC, and 4 for compensation data. First fill the reserved bytes with a certain dummy value (say 0x00). Then calculate the CRC into the first 4 bytes, and finally change the other 4 bytes so the CRC of the file stays the same.

For details on how to perform this calculation: Reversing CRC32


I actually used this in one of my projects:

I was designing a file format based on zip. The first file in the archive is stored uncompressed and serves as header file. This also means it is stored at a fixed offset in the file. So far pretty standard, and similar to for example ePub.

Now I decided to include a sha1 hash in the header, to give each file a unique content based Id and for integrity checking. Since the header and thus the sha1 hash is at a known offset in the file, masking it when hashing is trivial. So I put in a dummy hash and create the zip file, then hash the file and fill in the real hash.

But now there is a problem: Zip stores the CRC of all contained files. And not only in one place which would be easy to mask when sha1-hashing, but in a second place with variable offset near the end of the file. So I decided to go with CRC faking, so I get my strong hash, and zip gets its valid CRC32.

And since I was already faking the CRC for the final file, I decided faking it for the original header file wouldn't hurt either. Thus all files in this format now start with a header file that has the CRC 0xD1CE0DD5.

Community
  • 1
  • 1
CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • If this doesn't already have a name, I'd propose: "Ying/Yang CRC Embedding". – Chris Thornton Dec 22 '11 at 19:47
  • 1
    +1 (2morrow, out of votes now) Note that for this very reason CRC can only be used to detected accidental errors, not malicious altering of a file. – Johan Dec 22 '11 at 20:21
8

Simply put you need to exclude the bytes used to store the checksum from the checksum calculation.

Write the checksum as the last thing in the file. Calculate it based on the contents of the file apart from the checksum. When you come to read the file calculate the checksum based on the contents before the checksum. Or you could write the checksum as the first bytes of the file with random access. Just so long as you know where it is.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • thanks that seems clear now. I could find the value I saved in the File and assign it to a variable, then when I calculate the CRC of the file (except the part of my internal value), if the CRC matches the value assigned to my variable then i know it matches. –  Dec 22 '11 at 19:20
  • @Craig Yep, that would do it. If the program would still function without checksums then Craig Peterson's ADS idea is excellent. If you want cross-platform then it's less appropriate because it relies on NTFS. – David Heffernan Dec 22 '11 at 19:23
6

Store the CRC as part of the file itself, but don't include the data for it in the CRC calculation. If you have some sort of fixed header zero out the CRC field before passing it to the CRC function. If not, just append it to the end of the file and pass everything but the last 4 bytes into the CRC function.


Alternatively, if the files are stored on an NTFS drive and you don't need to transfer them to another computer you can use NTFS Alternate Data Streams to store the CRCs. Basically you open the file with the ADS name separated from the filename by a colon (like C:\file.txt:CRC). Windows handles the difference internally, so you can use plain TFileStream functions to manipulate them.

Alternate data streams are stored separately from the standard file stream, so opening or modifying just C:\file.txt won't affect it.

So, the code would look like this:

procedure UpdateCRC(const aFileName: string);
var
  FileStream, ADSStream: TStream;
  CRC: LongWord;
begin
  FileStream := TFileStream.Create(aFileName, fmOpenRead);
  try
    CRC := CrcOf(FileStream);
  finally
    FileStream.Free;
  end;

  ADSStream := TFileStream.Create(aFileName + ':CRC', fmCreate);
  try
    ADSStream.WriteBuffer(CRC, SizeOf(CRC));
  finally
    ADSStream.Free;
  end;
end;

If you need to find all of the alternate data streams attached to a file (there can be more than one), you can iterate over them using BackupRead. Internet Explorer uses ADSs to support the "This file has been downloaded from the Internet. Are you sure you want to open it?" prompt.

Zoë Peterson
  • 13,094
  • 2
  • 44
  • 64
1

I would recommend storing the checksum in another file, maybe a .ini file. Or for a really weird idea, you could incorporate the checksum as part of the filename.
i.e. MyFile_checksum_digits_here.dat

Chris Thornton
  • 15,620
  • 5
  • 37
  • 62
  • That would not work as the Filename is not set at runtime, it is through a TSaveDialog –  Dec 22 '11 at 19:07