I have noticed a file called 'MD5' is present in the directories of many R packages that I have downloaded. However I can not find a mention of this in the 'Writing R Extensions' manual. It lists the MD5 hash and filename for different files in the package. What is this file used for? Should it be something I include in my packages? How can it be generated?
-
Have you checked out `?md5sum`? – Mark O'Connell Jun 30 '16 at 13:41
-
Yes. It generates MD5 hashes for a list of files you give it, but it doesn't tell you what this file is for or format the output in the same way. – Matthew Lueder Jun 30 '16 at 13:53
4 Answers
The MD5 hash file found in the R packages is used to uniquely identify the package src
on a repository (e.g. CRAN).
Specifically, when the package is listed in a repo, the meta data of the package is added to a file called PACKAGES
. When a user requests a package via install.packages()
, a download is triggered that checks for the MD5 hash. This is stated in the ?md5sum
function
MD5 sums are used as a check that R packages have been unpacked correctly and not subsequently modified.
The inside of a PACKAGES
file would look like:
Package: datapkg
Version: 2.0.0
Depends: R (>= 3.2)
License: file LICENSE
MD5sum: 22797605db853f5f4c2c5612da366b53
NeedsCompilation: no
For more information on how repos work with install.packages()
, please see the post that I wrote:

- 20,011
- 13
- 69
- 84
-
-
1The `MD5sum:` line in `DESCRIPTION` is not the same thing as the package MD5 file. The file has a different purpose. See my answer below. :-) – hyginn Jan 18 '19 at 16:13
The file is used as input to tools::checkMD5sums()
and checks the integrity of the installed package. The format can be reverse engineered from the code: a text file that has a line for each included file, containing the MD5 hash, a *
separator, and the file path relative to the specified root directory. You can create these by hand from the output of tools::md5sum()
- or you can use a function that I have provided in this Gist, where I also discuss this in more detail.

- 301
- 3
- 9
-
1Very nice Gist! However, note that an internal function for generating the MD5 file does exist (see [this answer](https://stackoverflow.com/a/75277189/570918)). – merv Jan 29 '23 at 18:15
-
Thank you, and yes - you are right. Keep in mind that non-exported functions may change at any time, are typically not easily discoverable, and – since there are no comments about their intent – their use requires assumptions. – hyginn Jan 29 '23 at 22:34
Generating MD5 file
There is a non-exported function tools:::.installMD5sums(pkgDir, outDir = pkgDir)
that can generate the MD5
file. I've commented the essential steps here:
## compute check sum for every file in package
x <- md5sum(dir(".", recursive = TRUE))
## exclude any existing "MD5" file
x <- x[names(x) != "MD5"]
## write each result out to the "MD5" file
cat(paste(x, names(x), sep = " *"), sep = "\n",
file = file.path(outDir, "MD5"))
One can use it like:
tools:::.installMD5sums("/path/to/pkg/")

- 67,214
- 13
- 180
- 245
I arrived here from googling for how to get an md5 hash of a file. This method is simple and effective:
library(openssl)
as.character(md5(file("path/to/yourfile.txt", open = "rb")))
# [1] "30621b64b2232a67900738ab471f2067"

- 41,291
- 27
- 223
- 311