I am trying to build an application that needs to compare the MD5 hash of any file. Due to specific issues, before the upload, the MD5 must be generated client side, and after the upload the application needs to check it server side.
My first approach was to use, at the client side, the JavaScript File API and the FileReader.ReadAs
functions. Then I use the MD5 algorithm found here: http://pajhome.org.uk/crypt/md5/
Server side, I would use PHP's fopen
command and the md5
function.
This approach works fine when using simple text files. But, when a binary file is used (like some jpg or pdf), the MD5 generated at the client side is different from the server. Using md5sum
command-line tool I figured out that the server MD5 is correct and the problem occurs at client side.
I've tried other MD5 API's I found with the same results. I suspect that FileReader.ReadAs
functions are loading the file content slightly differently (I have tried all ReadAs
function variants: text, binary and so on), but I can't figure out what is the difference.
I'm missing something but don't know what, maybe I need to decode the content somehow before generating the MD5.
Any tips?
Edit 1:
I followed the idea given by optima1. Took each character and printed the unicode number both on javascript and PHP. I could see only one difference at the end on all the cases (used vimdiff).
PHP: 54 51 10 37 37 69 79 70 0
Javascript: 54 51 10 37 37 69 79 70
Maybe this extra zero at PHP is some kind of "string end". On both cases the binary strings have the same length. Adding a String.fromCharCode(0) to the end of the JS content do not solve the problem. I will keep investigating.
If i can't find a solution i will try to build a giant string by concatenating those charcodes and using it to build the MD5. It is a crap solution but will serve for now and i will just need to add a zero to the end of the JS string...
Edit 2:
Thank God! This implementantion works like a charm: http://www.myersdaily.org/joseph/javascript/md5.js
If you need to generate a MD5 hash from binary files, go for it.
Thanks in advance!