How can I use Python's hashlib to generate a checksum for a directory that matches 7zip's checksum?

Question

I've found an example online that generates a checksum by hashing the hashes of each individual file in whatever order os.path.walk() lists them (which is consistent so that's fine). I've copied the example here:

def GetHashofDirs(directory, verbose=0):
  import hashlib, os
  SHAhash = hashlib.sha1()
  if not os.path.exists (directory):
    return -1

  try:
    for root, dirs, files in os.walk(directory):
      for names in files:
        if verbose == 1:
          print 'Hashing', names
        filepath = os.path.join(root,names)
        try:
          f1 = open(filepath, 'rb')
        except:
          # You can't open the file for some reason
          f1.close()
          continue

        while 1:
          # Read file in as little chunks
          buf = f1.read(4096)
          if not buf : break
          SHAhash.update(hashlib.sha1(buf).hexdigest())
        f1.close()

  except:
    import traceback
    # Print the stack traceback
    traceback.print_exc()
    return -2

  return SHAhash.hexdigest()

print GetHashofDirs('My Documents', 1)

This works, but it doesn't give the same result as 7zip's checksum calculator, which is what I'm going for. I realize that this could be due to a difference in the order that the files are hashed, among several other tiny differences in the algorithm. How could I change this algorithm so it generates the same checksum as 7zip?

IIRC - 7zip supports multiple hash algorithms - so you'll need to make sure both systems are using the same one... Yup [here you go](http://www.sami-lehtinen.net/blog/using-7-zip-hashing-to-compare-directories-and-files) it defaults to CRC32 if not specified... — Jon Clements, Jul 05 '16 at 13:06
I'd suggest building an identical algorithm ;-) Starting here https://sourceforge.net/projects/sevenzip/ or http://www.7-zip.org/ whatever seems to yield the relevant source code part that is in effect (your version of 7zip) and represents the algorithm — Dilettant, Jul 05 '16 at 13:08
The order the files are added will affect the final checksum. And yes I think you're right @JonClements, 7zip supports at least CRC and SHA256 as far as I know. — Cory Kramer, Jul 05 '16 at 13:08
@CoryKramer yup - so in summary - looks like the OP needs to guarantee the order of inputs and make sure the hash algo's are the same... — Jon Clements, Jul 05 '16 at 13:10
I've accounted for 7zip's different checksum algorithms by specifying that it use SHA1, but I don't get the same output, and I think that is the result of differences in how it handles directories (it's simple to get the same result for individual files). @dilettant I think it will help to look through the source, thanks — sgfw, Jul 05 '16 at 13:12
It's impossible to generate a checksum that matches 7-zip. see :https://superuser.com/a/1519181/1132245 — 仕刀_, Jan 22 '20 at 08:04

How can I use Python's hashlib to generate a checksum for a directory that matches 7zip's checksum?

0 Answers0