0

When I am doing debugging of a complicated script, I often need to source the script repeatedly so that the RStudio break points are active. In this setting, I do not want to install packages that are already loaded (attached?), because RStudio strongly recommends restarting R whenever you try to install a package that is already loaded (attached?), and some of my code is being tested on large (circa 10-30 GB) data files for which uploading the file into R is time consuming.

Up to this moment, I have never had to distinguish between programs that are loaded and those that are attached, as I have always done both or neither. But I am now trying to write a function to avoid installing programs that are loaded (attached?), and I am trying to understand the relative implications of avoiding reinstalling only loaded packages, only attached packages, only packages that are both loaded and attached, or only packages that are either loaded or attached.

The code below is supposed to update all packages, and then install all packages taken from a vector of (perhaps) new packages, that are either uninstalled, or unattached, or both. However if a package is loaded but neither attached nor installed (if that is possible) then it will not be installed. Will that limitation cause problems under any plausible circumstances?

install.packs <- function(pks, ...){
  update.packages(ask=FALSE)
  uninstalled <- pks[!(pks %in% installed.packages(...)[ , 1])] 
  unattached  <- pks[!(pks %in% (.packages(...)))]
  new_pks     <- unique(c(uninstalled, unattached))
  install.packages(unattached, repos = "https://cloud.r-project.org/", ...)
}
andrewH
  • 2,281
  • 2
  • 22
  • 32
  • 1
    Why not use `require` and `requireNamespace` to identify whether a package you need to attach/load is already installed? – mikeck Nov 03 '18 at 06:53
  • A package can't be loaded unless it is installed. If it is attached, it is also loaded. So "attached" implies "loaded", "loaded" implies "installed". It is a bad idea to reinstall a loaded package. RStudio is pretty quick about restarting (you don't need to reload the data yourself, it is reloaded with the workspace), but for a really large dataset it could be slow. I think you just need to live with that if you want reliable debugging on huge datasets. – user2554330 Nov 03 '18 at 11:14

1 Answers1

1

While it does not entirely answer your question - You can run a script before you start debugging, for which packages are used in the script. In addition, to avoid version conflict in packages (e.g. if the script uses ggplot2_2.0.0 but you have ggplot2_3.0.0 installed) you can use the checkpoint package.

For example:

InstallNeededPackges <- function(path) {

  #Load/Install checkpoint pkg
  if (!require("checkpoint")) install.packages("checkpoint")

  #Getting the list of all pkgs in the script
  PkgsInScript <- checkpoint::scanForPackages(Path,
                                              use.knitr = TRUE)

  #Finding missing pkgs
  list.of.packages <- PkgsInScript$pkgs
  new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()
                                     [,"Package"])]

  #Installing
  install.packages(new.packages)

}
DJV
  • 4,743
  • 3
  • 19
  • 34