An MD5 hash should return the same value irrespective of who performs the hash calculation and where.
Yet using three different methods on the same file, we see three different answers (!?).
Here's the file.
The MD5 hash according to Amazon Web Services is:
library(dplyr)
"https://collidr-api.s3-ap-southeast-2.amazonaws.com/pfd.RDS" %>% curlGetHeaders %>% .[6] %>% trimws %>%
strsplit(., "ETag: ") %>% .[[1]] %>% .[2] %>%
{ substr(., 2, nchar(.)) } %>% { substr(., 1, nchar(.) - 1)}
# "a921f713fbd730a51814fb6602048c16"
The MD5 hash using the digest
library is
library(digest)
digest("Downloads/pfd.RDS", algo=c("md5"))
# "2b049aba0269e46d35780c3e7d29a916"
And the MD5 hash using openssl
library is
library(openssl)
md5("Downloads/pfd.RDS")
# "8ceabf9bdd146ed12ba89533cd593d12"
I can't explain this. I expected all three values to be the same since they're all applying the same algorithm (MD5) to the same file, yet all 3 are different.
Question
Why aren't the hash values the same irrespective of the method used to generate the MD5 hash of the file, and most importantly, how do I calculate the hash in R such that it matches the MD5 hash provided by AWS (i.e .a921f713fbd730a51814fb6602048c16
)?
UPDATE
In mac terminal md5 Downloads/pfd.RDS
returns a921f713fbd730a51814fb6602048c16
(consistent with the AWS value). It's still not clear why digest::digest()
and openssl::md5()
values are different.