4

I'm making a package for data manipulation that uses some other libraries under the hood. Let's say that my data always has a class "custom" and that I have a function custom_select() to select some columns.

I would like my package to have few dependencies but also a similar syntax as functions from dplyr. Because several dplyr functions are generics, I can use the same function names for a different input type. In my situation, I could make a method select.custom() so that the user can either pass a data.frame or a custom object to select() and both would work.

Now from my understanding, this requires putting dplyr in Imports because I need to have access to its select() generic. I'd like to avoid doing this because I want to limit the number of hard dependencies.

The scenario I have in mind is:

  • the user already loads dplyr anyway, then they can use select() with the data of class custom and it should work
  • the user doesn't have dplyr installed/loaded, and I don't want to force them to have it, so they can use the function custom_select() instead.

Ideally, I'd like to put dplyr in Suggests so that it's not strictly necessary but it adds something if the user has it.


Example

custom.R:

#' @export
#' @importFrom dplyr select
custom_select <- function(data, select) {
  print("Hello, world!")
}

#' @export
select.custom <- custom_select

NAMESPACE:

# Generated by roxygen2: do not edit by hand

export(custom_select)
export(select.custom)
importFrom(dplyr,select)

R CMD check errors if I don't put dplyr in Imports and putting it in Suggests also doesn't work (same error for both cases):

❯ checking package dependencies ... ERROR
  Namespace dependency missing from DESCRIPTION Imports/Depends entries: 'dplyr'

In summary, is there a way to keep dplyr out of hard dependencies while still providing methods for dplyr's generics if it is available?



Edit: I tried @VonC's answer but couldn't make it work. In the example below, dplyr is loaded before my custom package so select.custom() should be available but isn't:

library(dplyr, warn.conflicts = FALSE)
library(custompackage)

foo <- letters
class(foo) <- "custom"

custom_select(foo)
#> [1] "Hello, world!"
select(foo)
#> Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "custom"

Here are the important files:

custom.R

#' @export
custom_select <- function(data, select) {
  print("Hello, world!")
}

if (requireNamespace("dplyr", quietly = TRUE)) {
  select.custom <- function(data, select) {
    custom_select(data, select)
  }
  utils::globalVariables("select.custom")
}

NAMESPACE

# Generated by roxygen2: do not edit by hand

export(custom_select)

DESCRIPTION (no Imports)

[...]
Suggests:
  dplyr
Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48
bretauv
  • 7,756
  • 2
  • 20
  • 57
  • Sounds like you want to know if the calling environment is aware of your dependency. The data.table package does that reliably [here](https://github.com/Rdatatable/data.table/blob/88039186915028ab3c93ccfd8e22c0d1c3534b1a/R/cedta.R), so probably a good place to start... – Jthorpe Jun 19 '23 at 23:21

2 Answers2

4

You need to put dplyr in Enhances and use .onLoad to conditionally register your method in the dplyr namespace, depending on whether dplyr is installed at load time.

nm <- package <- "TestPackage"
dir.create(file.path(package,     "R"), recursive = TRUE)
dir.create(file.path(package,   "man"), recursive = TRUE)
dir.create(file.path(package, "tests"), recursive = TRUE)

cat(file = file.path(package, "DESCRIPTION"), "
Package: TestPackage
Version: 0.0-0
License: GPL (>= 2)
Description: A (one paragraph) description of what
  the package does and why it may be useful.
Title: My First Collection of Functions
Author: First Last [aut, cre]
Maintainer: First Last <First.Last@some.domain.net>
Enhances: dplyr
")

cat(file = file.path(package, "NAMESPACE"), "
export(selectDotZzz)
")

cat(file = file.path(package, "R", paste0(nm, ".R")), "
selectDotZzz <- function(.data, ...) 0
.onLoad <- function(libname, pkgname) {
    if(requireNamespace(\"dplyr\", quietly = TRUE))
        registerS3method(\"select\", \"zzz\", selectDotZzz,
                         envir = asNamespace(\"dplyr\"))
}
")

cat(file = file.path(package, "man", paste0(nm, ".Rd")), "
\\name{whatever}
\\alias{selectDotZzz}
\\title{whatever}
\\description{whatever}
")

cat(file = file.path(package, "tests", paste0(nm, ".R")),
    sprintf("library(%s)", nm))
cat(file = file.path(package, "tests", paste0(nm, ".R")), append = TRUE, "
if(requireNamespace(\"dplyr\", quietly = TRUE))
    stopifnot(identical(dplyr::select(structure(0, class = \"zzz\")), 0))
")

getRversion()
packageVersion("dplyr")
tools:::Rcmd(c("build", package))
tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))

unlink(Sys.glob(paste0(nm, "*")), recursive = TRUE)

The relevant output:

> getRversion()
[1] '4.3.1'
> packageVersion("dplyr")
[1] '1.1.2'
> tools:::Rcmd(c("build", package))
* checking for file 'TestPackage/DESCRIPTION' ... OK
* preparing 'TestPackage':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'TestPackage_0.0-0.tar.gz'
> tools:::Rcmd(c("check", Sys.glob(paste0(nm, "_*.tar.gz"))))
* using log directory '/Users/mikael/Desktop/R-experiments/codetools/TestPackage.Rcheck'
* using R version 4.3.1 Patched (2023-06-19 r84580)
* using platform: aarch64-apple-darwin22.5.0 (64-bit)
* R was compiled by
    Apple clang version 14.0.3 (clang-1403.0.22.14.1)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Ventura 13.4
* using session charset: UTF-8
* checking for file 'TestPackage/DESCRIPTION' ... OK
* this is package 'TestPackage' version '0.0-0'
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package 'TestPackage' can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking startup messages can be suppressed ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... NONE
* checking for unstated dependencies in 'tests' ... OK
* checking tests ...
  Running ‘TestPackage.R’
 OK
* checking PDF version of manual ... OK
* DONE

Status: OK

Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48
1

You can do this by defining your method within an if statement that checks if the package is available, and using utils::globalVariables() to avoid R CMD check notes about undefined global functions or variables.

Altering the namespace of another package is generally not allowed according to CRAN policies, and trying to assign a function directly to dplyr::select.custom would likely not be acceptable.

The idea is therefore to assign the method to select.custom in the global environment, which will allow it to be used as a method for the generic select() function if dplyr is loaded. The key point is that you are not modifying the dplyr namespace directly.

You would need to adjust your custom.R file:

#' Custom select function
#'
#' @export
custom_select <- function(data, select) {
  print("Hello, world!")
}

if (requireNamespace("dplyr", quietly = TRUE)) {
  select.custom <- function(data, select) {
    custom_select(data, select)
  }
  utils::globalVariables("select.custom")
}

And for the NAMESPACE file, you can avoid importing anything from dplyr:

# Generated by roxygen2: do not edit by hand

export(custom_select)

In your DESCRIPTION file, you should list dplyr under Suggests:

Suggests:
    dplyr

This way, if the user has dplyr loaded, the select.custom method will be available. If dplyr is not loaded, the user can still use the custom_select() function. This approach keeps dplyr out of the hard dependencies while still providing methods for dplyr's generics if it is available.

That will create a function select.custom in your package's namespace, which will be used as a method for select() when the dplyr package is loaded. Note that this does not modify the dplyr namespace itself.

Also, it is a good idea to document that the select.custom functionality is only available if dplyr is installed and loaded, and that dplyr is a suggested, not required, dependency.

In your DESCRIPTION file, you should list dplyr under Suggests:

Suggests:
    dplyr

This approach should be compliant with CRAN policies, as you are not altering another package's namespace and are conditionally defining methods based on the availability of the dplyr package.


That being said, Mikael Jagan points out in the comments to "Writing R Extensions / Creating R packages / Package structure / Package Dependencies / Suggested packages"

WARNING: Be extremely careful if you do things which would be run at installation time depending on whether suggested packages are available or not—this includes top-level code in R code files, .onLoad functions and the definitions of S4 classes and methods.

The problem is that once a namespace of a suggested package is loaded, references to it may be captured in the installed package (most commonly in S4 methods), but the suggested package may not be available when the installed package is used (which especially for binary packages might be on a different machine).

Even worse, the problems might not be confined to your package, for the namespaces of your suggested packages will also be loaded whenever any package which imports yours is installed and so may be captured there.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 1
    @SamR That would not be allowed indeed. I have rewritten the answer to take that into account. – VonC Jun 18 '23 at 22:22
  • Hmm ... I would encourage a careful reading of [Section 1.1.3](https://stat.ethz.ch/R-manual/R-patched/doc/manual/R-exts.html#Package-Dependencies) (including 1.1.3.1) of WRE, notably the warning in bold at the bottom. – Mikael Jagan Jun 21 '23 at 06:34
  • @MikaelJagan That is an important caveat indeed. I have included the warning in the answer for more visibility. – VonC Jun 21 '23 at 07:10
  • Thanks but I can't make this work (see the edit at the bottom of my post) – bretauv Jun 21 '23 at 13:42