3

If the user uploads some file to my server and I want to make sure that the file hasn't been changed from the last time the user uploaded it, how can I get this information?

I have a log table with User_id and FileName (the User_id is unique). I delete the file after I read the contents.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Anyname Donotcare
  • 11,113
  • 66
  • 219
  • 392

2 Answers2

7

You can store a hash of the file before deleting it. To see if it is the same file being uploaded, compare the hash with the previous hash. You can do this with one of the HashAlgorithm classes in System.Cryptography, such as SHA1.

"A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value"

Here is some example code to get you started, assuming the variable stream is a stream with your file data (you could use FileStream to open it):

var sha = new System.Security.Cryptography.SHA1Managed();
byte [] hash = sha.ComputeHash(stream);

Now, the variable hash will contain the hash, a fingerprint of the file contents. Even a small change (such as a single bit) will result in a different hash value, but taking the hash on the same file will always return the same hash.

driis
  • 161,458
  • 45
  • 265
  • 341
  • How to do store a hash of the file? and if the user changes the contents , Did this change the hash of the file? – Anyname Donotcare Nov 27 '11 at 15:13
  • 2
    Yes, even a change of 1 byte of data, half of the bits in the hash change. The hash algorithms are designed to be so. – Drona Nov 27 '11 at 15:17
  • If i remember I face the following exception some times when i hash some data .`Invalid length for a Base-64 char array` – Anyname Donotcare Nov 27 '11 at 15:19
  • What is the best algorthim in my case and how to avoid this exception? – Anyname Donotcare Nov 27 '11 at 15:20
  • 2
    base64 has nothing to do with the hashing, but in some cases it can be practical to store the hash encoded in base64. I updated with an example. – driis Nov 27 '11 at 15:23
  • 4
    You don't need to worry too much about picking the "right" hash algorithm for this, all of the hash classes in System.Security.Cryptography is sufficient, if you only need to know if the file changes. SHA1 or another of the SHA variants is a common choice. If this was for security purposes, picking the right algorithm would be more important. – driis Nov 27 '11 at 15:25
  • Thank us so much. one more question , Does it take time (performance wise) to hash a large file? – Anyname Donotcare Nov 27 '11 at 15:28
  • 1
    Yes, it does take some time to calculate the hash. – driis Nov 27 '11 at 15:34
  • 1
    A less performance intensive way of doin it would be to use the `FileInfo` class and compare the `LastWriteTime` with the date you uploaded it (from a DB or wherever). There are tradeoffs as the LastWriteTime (date modified) can be altered without the file contents changing and also the file contents can change and the date modified be "reset" to before the change. It's worth considerin if it doesn't put your site's file integrity at risk. – Gavin Ward Nov 27 '11 at 17:12
  • @ PunkyGuy: thanks a lot i think about this method but i fail. http://stackoverflow.com/questions/8285733/how-to-get-the-last-modified-date-of-the-uploaded-file – Anyname Donotcare Nov 27 '11 at 18:44
3

Hash is the generic kind of function. Usually to detect changes into large chunks of data, like files, is used some crc

Under linux there is the standard utility cksum

You can spawn a cksum filename and grab the output. Store it, for example in your database, and check on new file incoming.

Massimo
  • 3,171
  • 3
  • 28
  • 41