I've found an example online that generates a checksum by hashing the hashes of each individual file in whatever order os.path.walk() lists them (which is consistent so that's fine). I've copied the example here:
def GetHashofDirs(directory, verbose=0):
import hashlib, os
SHAhash = hashlib.sha1()
if not os.path.exists (directory):
return -1
try:
for root, dirs, files in os.walk(directory):
for names in files:
if verbose == 1:
print 'Hashing', names
filepath = os.path.join(root,names)
try:
f1 = open(filepath, 'rb')
except:
# You can't open the file for some reason
f1.close()
continue
while 1:
# Read file in as little chunks
buf = f1.read(4096)
if not buf : break
SHAhash.update(hashlib.sha1(buf).hexdigest())
f1.close()
except:
import traceback
# Print the stack traceback
traceback.print_exc()
return -2
return SHAhash.hexdigest()
print GetHashofDirs('My Documents', 1)
This works, but it doesn't give the same result as 7zip's checksum calculator, which is what I'm going for. I realize that this could be due to a difference in the order that the files are hashed, among several other tiny differences in the algorithm. How could I change this algorithm so it generates the same checksum as 7zip?