0

In our system, when a user uploads a file it is stored in a unique file system structure and a database record is generated. A file is uploaded via the webbrowser via XMLHttpRequest. The file then gets moved from the temporary upload area into the FS.

How can I detect that a file after being uploaded already exists in my FS?

If the file uploaded is the same as one already uploaded.
If the file is the same file, but the uploaded content has been updated which 
  means I need to update the file in the FS.

I am ignoring file names as a way of knowing if the file already exists. A filename cannot be considered unique. An example is that some cameras name photos using an incremental number that rolls over after a time. When a file is uploaded via the web browser, the source file structure is masked. E.g. C:\Users\Drive\File\Uploaded\From. So I cant use the that to figure out if the file has already been uploaded.

How do I know the file being uploaded already exists because its content is the same. Or it exists but because the uploaded file has been changed, so I can just update the file?

Microsoft Word documents create a challenge as Word regenerates the file on every save.

In a situation where the user renames a file on their own accord, I could say tough luck.

Valamas
  • 24,169
  • 25
  • 107
  • 177

1 Answers1

1

I would start with finding files that are the same via an SHA Hash. You could use something like this to get a list of files that have the same hash as your newly uploaded file then take some action.

Just an example of getting the hash of the new file:

string newfile;
    using(FileStream fs = new FileStream(   string newfile;
    using(FileStream fs = new FileStream("C:\\Users\\Drive\\File\\Uploaded\\From\\newfile.txt", FileMode.Open))
    {
        using (System.Security.Cryptography.SHA1Managed sha1 = new System.Security.Cryptography.SHA1Managed())
        {
            newfile = BitConverter.ToString(sha1.ComputeHash(fs));
        }
    }   

This goes through all files and gets a list of file names and hashes

var allfiles = Directory.GetFiles(@"var allfiles = Directory.GetFiles(@"C:\Users\Drive\File\Uploaded\From\", "*.*")
        .Select(
            f => new
                     {
                         FileName = f,
                         FileHash = new System.Security.Cryptography.SHA1Managed()
                                                            .ComputeHash(new FileStream(f, 
                                                                             FileMode.Open, 
                                                                             FileAccess.Read))
                     })       
        .ToList();

        foreach(var fi in allfiles){
        if(newfile == BitConverter.ToString(fi.FileHash))
            Console.WriteLine("Match!!!");
        Console.WriteLine(fi.FileName + ' ' + BitConverter.ToString(fi.FileHash));
        }

}", ".") .Select( f => new { FileName = f, FileHash = new System.Security.Cryptography.SHA1Managed() .ComputeHash(new FileStream(f, FileMode.Open, FileAccess.Read)) })
.ToList();

This loops through them all and looks for a match to the new one.

        foreach(var fi in allfiles){
        if(newfile == BitConverter.ToString(fi.FileHash))
            Console.WriteLine("Match!!!");
        Console.WriteLine(fi.FileName + ' ' + BitConverter.ToString(fi.FileHash));
        }

Ideally you would save this hash when the file is uploaded since this is very intense to recompute.

ericdc
  • 11,217
  • 4
  • 26
  • 34
  • yes, this will prevent files which have not changed( this is, uploading the same file twice). I am still looking for a way to identify a file that has changed and to update it (this is uploading a file, changing it locally, uploading it again). – Valamas Jun 26 '13 at 00:54
  • The only way I can think of is to have your uploader page pass along some unique ID of the file that they are overwriting. – ericdc Jun 26 '13 at 00:56
  • looks like I can only prevent duplicates if the file has not changed. thanks for your examples. – Valamas Jun 26 '13 at 01:43