11

I have noticed a file called 'MD5' is present in the directories of many R packages that I have downloaded. However I can not find a mention of this in the 'Writing R Extensions' manual. It lists the MD5 hash and filename for different files in the package. What is this file used for? Should it be something I include in my packages? How can it be generated?

Matthew Lueder
  • 953
  • 2
  • 12
  • 25

4 Answers4

9

The MD5 hash file found in the R packages is used to uniquely identify the package src on a repository (e.g. CRAN).

Specifically, when the package is listed in a repo, the meta data of the package is added to a file called PACKAGES. When a user requests a package via install.packages(), a download is triggered that checks for the MD5 hash. This is stated in the ?md5sum function

MD5 sums are used as a check that R packages have been unpacked correctly and not subsequently modified.

The inside of a PACKAGES file would look like:

Package: datapkg
Version: 2.0.0
Depends: R (>= 3.2)
License: file LICENSE
MD5sum: 22797605db853f5f4c2c5612da366b53
NeedsCompilation: no

For more information on how repos work with install.packages(), please see the post that I wrote:

http://thecoatlessprofessor.com/programming/r-data-packages-in-external-data-repositories-using-the-additional_repositories-field/

coatless
  • 20,011
  • 13
  • 69
  • 84
5

The file is used as input to tools::checkMD5sums() and checks the integrity of the installed package. The format can be reverse engineered from the code: a text file that has a line for each included file, containing the MD5 hash, a * separator, and the file path relative to the specified root directory. You can create these by hand from the output of tools::md5sum() - or you can use a function that I have provided in this Gist, where I also discuss this in more detail.

hyginn
  • 301
  • 3
  • 9
  • 1
    Very nice Gist! However, note that an internal function for generating the MD5 file does exist (see [this answer](https://stackoverflow.com/a/75277189/570918)). – merv Jan 29 '23 at 18:15
  • Thank you, and yes - you are right. Keep in mind that non-exported functions may change at any time, are typically not easily discoverable, and – since there are no comments about their intent – their use requires assumptions. – hyginn Jan 29 '23 at 22:34
1

Generating MD5 file

There is a non-exported function tools:::.installMD5sums(pkgDir, outDir = pkgDir) that can generate the MD5 file. I've commented the essential steps here:

## compute check sum for every file in package
x <- md5sum(dir(".", recursive = TRUE))

## exclude any existing "MD5" file
x <- x[names(x) != "MD5"]

## write each result out to the "MD5" file
cat(paste(x, names(x), sep = " *"), sep = "\n", 
    file = file.path(outDir, "MD5"))

One can use it like:

tools:::.installMD5sums("/path/to/pkg/")
merv
  • 67,214
  • 13
  • 180
  • 245
-1

I arrived here from googling for how to get an md5 hash of a file. This method is simple and effective:

library(openssl)
as.character(md5(file("path/to/yourfile.txt", open = "rb")))
# [1] "30621b64b2232a67900738ab471f2067"
stevec
  • 41,291
  • 27
  • 223
  • 311