I have an md5 function which i have confirmed to work well for both files and strings. But when i use it on variable sized chunks of very large files it generates md5 values which are the same but the size of the chunks is different.
I wonder if there is a probability that two chunks with different lengths but may be with the same content result in similar md5 fingerprints.