0

I have a set of files and a SHA256SUMS digest file that contains a sha256() hash for each of the files. What's the best way to verify the integrity of my files with python?

For example, here's how I would download the Debian 10 net installer SHA256SUMS digest file and download/verify its the MANIFEST file in BASH

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 02:11:20--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K  71.7KB/s    in 1.0s    

2020-08-25 02:11:22 (71.7 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 02:11:27--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 02:11:28 (128 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ sha256sum --check --ignore-missing SHA256SUMS 
./MANIFEST: OK
user@host:~$ 

What is the best way to do this same operation (download and verify the integrity of the Debian 10 MANIFEST file using the SHA256SUMS file) in python?

Michael Altfield
  • 2,083
  • 23
  • 39

3 Answers3

1

The following python script implements a function named integrity_is_ok() that takes the path to a SHA256SUMS file and a list of files to be verified, and it returns False if any of the files couldn't be verified and True otherwise.

#!/usr/bin/env python3
from hashlib import sha256
import os

# Takes the path (as a string) to a SHA256SUMS file and a list of paths to
# local files. Returns true only if all files' checksums are present in the
# SHA256SUMS file and their checksums match
def integrity_is_ok( sha256sums_filepath, local_filepaths ):

    # first we parse the SHA256SUMS file and convert it into a dictionary
    sha256sums = dict()
    with open( sha256sums_filepath ) as fd:
        for line in fd:
            # sha256 hashes are exactly 64 characters long
            checksum = line[0:64]

            # there is one space followed by one metadata character between the
            # checksum and the filename in the `sha256sum` command output
            filename = os.path.split( line[66:] )[1].strip()
            sha256sums[filename] = checksum

    # now loop through each file that we were asked to check and confirm its
    # checksum matches what was listed in the SHA256SUMS file
    for local_file in local_filepaths:

        local_filename = os.path.split( local_file )[1]

        sha256sum = sha256()
        with open( local_file, 'rb' ) as fd:
            data_chunk = fd.read(1024)
            while data_chunk:
                sha256sum.update(data_chunk)
                data_chunk = fd.read(1024)

        checksum = sha256sum.hexdigest()
        if checksum != sha256sums[local_filename]:
            return False

    return True

if __name__ == '__main__':

    script_dir = os.path.split( os.path.realpath(__file__) )[0]
    sha256sums_filepath = script_dir + '/SHA256SUMS'
    local_filepaths = [ script_dir + '/MANIFEST' ]

    if integrity_is_ok( sha256sums_filepath, local_filepaths ):
        print( "INFO: Checksum OK" )
    else:
        print( "ERROR: Checksum Invalid" )

Here is an example execution:

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 22:40:16--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K   201KB/s    in 0.4s    

2020-08-25 22:40:17 (201 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 22:40:32--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 22:40:32 (13.0 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ ./sha256sums_python.py 
INFO: Checksum OK
user@host:~$ 

Parts of the above code were adapted from the following answer on Ask Ubuntu:

Michael Altfield
  • 2,083
  • 23
  • 39
0

You may calculate the sha256sums of each file as described in this blog post:

https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html

A sample implementation to generate a new manifest file may look like:

import hashlib
from pathlib import Path

# Your output file
output_file = "manifest-check"

# Your target directory
p = Path('.')

sha256_hash = hashlib.sha256()

with open(output_file, "w") as out:
  # Iterate over the files in the directory
  for f in p.glob("**/*"):
    # Process files only (no subdirs)
    if f.is_file():
      with open(filename,"rb") as f:
      # Read the file by chunks
      for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)
      out.write(f + "\t" + sha256_hash.hexdigest() + "\n")

Alternatively, this seems to be achieved by manifest-checker pip package.

You may have a look at its source here https://github.com/TonyFlury/manifest-checkerand adjust it for python 3

mabe02
  • 2,676
  • 2
  • 20
  • 35
  • That doesn't appear to use the `SHA256SUMS` file at all; it only calculates the hash. There's no comparison step to check if the hash matches the checksum listed in the digest file.. – Michael Altfield Aug 24 '20 at 20:39
  • well, with this snippet you can calculate the sha256 checksum of each file you downloaded. you will still have to parse the MANIFEST file and compare your output with the one provided by the file. Are you trying to achieve anything different? – mabe02 Aug 24 '20 at 20:41
  • You have to parse the `SHA256SUMS` file, not the `MANIFEST`. That parsing is what's missing from your solution and it's the whole point of the question (making sure it can properly parse all valid formats of `SHA256SUMS` files generated with the `sha256sum` command) – Michael Altfield Aug 24 '20 at 20:44
  • You have copied code verbatim from an external website that says "Copyright © Quick Programming Tips. All Rights Reserved." I am going to edit the answer to get rid of it. – alani Aug 24 '20 at 20:45
  • sorry @alani I kept the reference to the source. I was not aware I could not quote other websites. Thanks – mabe02 Aug 24 '20 at 20:48
  • 1
    @mabe02 There is a difference between "fair use" short quotes versus copying quotation of large chunks unless licensed to do so. However, the one that you have put in its place is under an MIT licence so I believe that you *are* allowed to copy that one here, just not the one that you used originally. (Note I am not a lawyer - this is best-efforts advice...) – alani Aug 24 '20 at 20:49
  • @MichaelAltfield I updated the answer after your comment – mabe02 Aug 24 '20 at 21:06
  • @mabe02 afaict, that python package doesn't answer this question either. Rather, it provides an alternate method for downloading and verifying files' integrity. As my OQ states, I already have a `SHA256SUMS` file and a set of files (giving Debian's release files as an example). – Michael Altfield Aug 24 '20 at 21:14
  • Just to make sure I understand your need. You have the manifest with all the checksum provided by Debian > you want to verify that the downloaded files correspond to the source files. In that case, you can calculate the checksum of downloaded files and compare with the one provided in the manifest, right? – mabe02 Aug 24 '20 at 21:17
  • No, the `MANIFEST` file is just some random file that I downloaded. I chose it because it was small, but I guess that's caused a lot of confusion. The `SHA256SUMS` file is generated with the `sha256sum` command and contains the `sha256()` hash for all of the files they host on their server for download. I recommend reading `man sha256sum` and re-reading my question if you're still confused. Also https://help.ubuntu.com/community/HowToSHA256SUM – Michael Altfield Aug 25 '20 at 06:23
0

Python 3.11 added hashlib.file_digest()

https://docs.python.org/3.11/library/hashlib.html#file-hashing

Generating the digest for a file:

with open("my_file", "rb") as f:
    digest = hashlib.file_digest(f, "sha256")
    s = digest.hexdigest()

Compare s against the information you have in SHA256SUMS.

psq
  • 367
  • 5
  • 12