4

I am trying to compare Local and remote file MD5 hash (the same file i copy/paste in my wamp "www" directory), but I don't understand why the "checksums" are not corresponding...

Here's the checksum code:

#-*- coding: utf-8 -*-

import hashlib
import requests

def md5Checksum(filePath,url):
    if url==None:
        with open(filePath, 'rb') as fh:
            m = hashlib.md5()
            while True:
                data = fh.read(8192)
                if not data:
                    break
                m.update(data)
            return m.hexdigest()
    else:
        r = requests.get(url, stream=True)
        m = hashlib.md5()
        for line in r.iter_lines():
            m.update(line)
        return m.hexdigest()

print "checksum_local :",md5Checksum("projectg715gb.pak",None)
print "checksum_remote :",md5Checksum(None,"http://testpangya.ddns.net/projectg715gb.pak")

And I am suprised to get this output :

checksum_local : 9d33806fdebcb91c3d7bfee7cfbe4ad7
checksum_remote : a13aaeb99eb020a0bc8247685c274e7d

The size of "projectg715gb.pak" is 14.7Mb

But if I try with a text file (size 1Kb) :

print "checksum_local :",md5Checksum("toto.txt",None)
print "checksum_remote :",md5Checksum(None,"http://testpangya.ddns.net/toto.txt")

Then it works oO I get this output :

checksum_local : f71dbe52628a3f83a77ab494817525c6
checksum_remote : f71dbe52628a3f83a77ab494817525c6

I am new to comparing MD5 hash so be nice please ^^' I might have done some big mistake, I don't understand why it doesn't work on big files, if someone could give me a hint, it would be super nice!

However thanks for reading and helping !

faruk13
  • 1,276
  • 1
  • 16
  • 23
Garbez François
  • 327
  • 3
  • 13
  • 3
    In one you're reading 8192 bytes at a time, in the other you're reading the whole line. The line lengths probably aren't 8192 bytes exactly. Just update `fh.read(8192)` to `fh.readline()` – Nick Chapman Feb 01 '19 at 16:59
  • @NickChapman I tried, so if i print the data and the line it looks more similar but the checksums are still differents :'( – Garbez François Feb 01 '19 at 17:08
  • 1
    `.iter_lines()` on the url version is discarding any newline characters from the data. Use `.iter_content()` to read it in chunks, just like the file version. – jasonharper Feb 01 '19 at 18:09
  • Thanks !!! @jasonharper Works well ^^ – Garbez François Feb 01 '19 at 18:26

3 Answers3

6

So thanks to helpers here's the final code working :

#-*- coding: utf-8 -*-

import hashlib
import requests

def md5Checksum(filePath,url):
    m = hashlib.md5()
    if url==None:
        with open(filePath, 'rb') as fh:
            m = hashlib.md5()
            while True:
                data = fh.read(8192)
                if not data:
                    break
                m.update(data)
            return m.hexdigest()
    else:
        r = requests.get(url)
        for data in r.iter_content(8192):
             m.update(data)
        return m.hexdigest()

print "checksum_local :",md5Checksum("projectg715gb.pak",None)
print "checksum_remote :",md5Checksum(None,"http://testpangya.ddns.net/projectg715gb.pak")
Garbez François
  • 327
  • 3
  • 13
2

Ok looks like i found a solution so i will post it here :)

First you need to edit an .htaccess file to the directory where your files are on your server.

Content of the .htaccess file :

ContentDigest On

Now that you have set up this the server should send Content-MD5 data in HTTP header.

It will result in something like :

'Content-MD5': '7dVTxeHRktvI0Wh/7/4ZOQ=='

Ok now let see Python part, so i modified my code to be able to compare this HTTP header data and local md5 Checksum.

#-*- coding: utf-8 -*-

import hashlib
import requests
import base64

def md5Checksum(filePath,url):
    m = hashlib.md5()
    if url==None:
        with open(filePath, u'rb') as fh:
            m = hashlib.md5()
            while True:
                data = fh.read(8192)
                if not data:
                    break
                m.update(data)
            #Get BASE 64 Local File md5
            return base64.b64encode(m.digest()).decode('ascii')#Encode MD5 digest to BASE 64
            
    else:
        #Get BASE 64 Remote File md5
        r = requests.head(url) #You read HTTP Header here
        return r.headers['Content-MD5'] #Take only Content-MD5 string

def compare():
    local = md5Checksum("projectg502th.pak.zip",None)
    remote = md5Checksum(None,"http://127.0.0.1/md5/projectg502th.pak.zip")

    if local == remote :
        print("The soft don't download the file")
    else:
        print("The soft download the file")

print ("checksum_local :",md5Checksum("projectg_ziinf.pak.zip",None))
print ("checksum_remote : ",md5Checksum(None,"http://127.0.0.1/md5/projectg_ziinf.pak.zip"))

compare()

Output :

checksum_local : 7dVTxeHRktvI0Wh/7/4ZOQ==
checksum_remote : 7dVTxeHRktvI0Wh/7/4ZOQ==
The soft don't download the file

I hope this will help ;)

Garbez François
  • 327
  • 3
  • 13
  • I made tests on Apache 2.4.4 on WAMP. – Garbez François Dec 22 '20 at 21:03
  • 1
    That's pretty badass. had no idea you could do that via htaccess. I was using a json file hosted on the server to get the known good checksum of the remote file without having to download it to compare. You may also like that approach if your parsing it in your app. See my link below if curious might be another approach you can take to post and maintain the known hash checksum. I had a bunch of files so ended up combining into one json to extract inside the app. `https://wizardassistant.com/wizardassistant_app_current_release_url.json` – Mike R Dec 23 '20 at 22:14
1

Thanks for posting your solution https://stackoverflow.com/users/7495742/framb-axa

Was super helpful for my issue.

I slightly revised the md5 part and print statements for python3 and swapped them to use sha256 for my use and it works awesome for my needs to download/check a local and remote sqlite DB for an app i built. Leaving code here as well in as a reference for anyone else who might stumble on this post as well.

import hashlib
import requests


# current release version url
current_release_url = 'https://somedomain.here/current_release.txt'
current_release_notes_url = 'https://somedomain.here/current_release_notes.txt'

# current database release  version url
current_db_release_url = 'https://somedomain.here/current_db_release.txt'
current_db_release_notes_url = 'https://somedomain.here/current_db_release_notes.txt'
current_db_release_notes_hash_url = 'https://somedomain.here/current_db_release_hash.txt'
current_db_release = ''
wizard_db_version = ''

# Default commands DB url
wizard_cmd_db_url = 'https://somedomain.here/sqlite.db'

wizard_cmd_db = 'some/path'

checksum_local = ''
checksum_remote = ''
checksum_remote_hash = ''
checksum_status = ''


def download_cmd_db():
    try:
        print('Downloading database update version: ' + str(current_db_release))
        url = wizard_cmd_db_url
        r = requests.get(url)
        with open(wizard_cmd_db, 'wb') as f:
            f.write(r.content)

        # Retrieve HTTP meta-data
        print(r.status_code)
        # print(r.headers['content-type'])
        # print(r.encoding)
        settings.setValue('wizard_db_version', current_db_release)
        print('Database downloaded to:' + str(wizard_cmd_db))
    except:
        print('Commands Database download failed.... ;( ')


def sha256_checksum(filepath, url):
    m = hashlib.sha256()
    if url is None:
        with open(filepath, 'rb') as fh:
            m = hashlib.sha256()
            while True:
                data = fh.read(8192)
                if not data:
                    break
                m.update(data)
            return m.hexdigest()
    else:
        r = requests.get(url)
        for data in r.iter_content(8192):
            m.update(data)
        return m.hexdigest()


def wizard_db_hash_check():
    global checksum_local, checksum_remote, checksum_status
    try:
        checksum_local = sha256_checksum(wizard_cmd_db, None)
        checksum_remote = sha256_checksum(None, wizard_cmd_db_url)
        print("checksum_local : " + checksum_local)
        print("checksum_remote: " + checksum_remote)
        print("checksum_remote_hash: " + checksum_remote_hash)

        if checksum_local == checksum_remote_hash:
            print('Hash Check passed')
            checksum_status = True
        else:
            print('Hash Check Failed')
            checksum_status = False
    except:
        print('Could not perform wizard_db_hash_check')


# Sanity check for missing database file
file = pathlib.Path(wizard_cmd_db)
if file.exists():
    print("DB File exists: " + wizard_cmd_db)
    wizard_db_hash_check()
else:
    print("DB File does NOT exist: " + wizard_cmd_db)
    download_cmd_db()
    wizard_db_hash_check()

# Check hash

# # Logic to decide when to download DB here
try:
    if int(current_db_release) > int(wizard_db_version):
        print('Database update available: ' + str(current_db_release))
        download_cmd_db()
        wizard_db_hash_check()
except:
    print('Unable to check wizard_db_release')

if checksum_local != checksum_remote:
    download_cmd_db()
    wizard_db_hash_check()

# Logic to fallback to default packaged DB if no internet to download and compare hash
if checksum_status is True:
    target_db = str(wizard_cmd_db)
else:
    print('All hash checks and attempts to update commands DB have failed. Switching to bundled DB')
    target_db = os.path.join(os.path.abspath(os.path.dirname(sys.argv[0])), "sqlite.db")

print('Sanity Checks completed')
Mike R
  • 679
  • 7
  • 13
  • The only problem i have now is when i try this solution on distant server, it will take a really long time to get the remote file md5 checksum.... :((( – Garbez François Dec 20 '20 at 07:11
  • Yeah it will if the remote file is large as it has to download it entirely to check the hash. If this is a remote file on a server you control you can also get the sum over ssh easily though. I use the below to get the hash over ssh as the hashing is then done on the server. It's not ideal for confirming your local file hash but works to just see what it should be. `ssh username@someserver "sha256sum /home/some/path/to/file"` – Mike R Dec 21 '20 at 13:40
  • I found another way to do it just look next answer :) – Garbez François Dec 22 '20 at 21:11