I calculate a check sum to compare with others and find duplicates files but for office files, share point properties are include. So a file with different location for example don't have same check sum.
My idea is to open in a memorystream this file unzip xml (for word word/document.xml) and use it to calculate checksum or use crc property of my zip library. By this way i don't include doc properties but only content (a part)
it work well but for excel or powerpoint there is several files in a folder to represent content of doc.
First do you think it is the right way. Second how can I combine crc properties of files to have a CRC representing the content folder.
For word /word
For Excel /xl/worksheets
For powerpoint ppt/slides