I am working on a document management system and in order to detect changes in files/duplicates of files I am using sha256 to get the digests for comparison. This is being done in python. The system can be configured to encrypt the files before storage.
The question is whether it is still safe to store the digest for the unencrypted file.
This digest is used as an identifier for the stored files and is also used to detect if the file being added to the system already exists. I am okay with the chance of collision of sha256 algorithm for this purpose. I have also read that the digest produced by sha256 cannot be used to recreate the original data.
Assuming the file cannot be reconstructed from the hash and the fact that the file is stored in encrypted form, it should be safe to keep the original hash for comparisons/searching right... or should I rethink my strategy? these comparisons are going to be internal to the application and will not be exposed to the user in anyway.