8

I am currently writing a function for an R package. Part of what this function is aimed to do is (a) take data as an input and (b) check one of its columns against a list of acceptable values.

These acceptable values are given to me from another organization. They are within a .csv file. What I would like to do is load this .csv file and use it as a reference to check if the column from the user has valid values.

For example, let's say the user has these data:

set.seed(1839)
user <- data.frame(x=sample(letters,10),
                   y=rnorm(10))
user

   x          y
1  v -0.7025836
2  p -1.4586245
3  f  0.1987113
4  y  1.0544690
5  o -0.7112214
6  m  0.2956671
7  b  0.3016737
8  a -0.0945271
9  x -0.2790357
10 c  0.1681388

And the .csv contains many (useful) columns, but I only care about one (z) for the moment:

ref <- data.frame(z=letters[1:4], a=rnorm(4), b=(rnorm(4)))
ref

  z          a          b
1 a -0.3563105  1.4536406
2 b  1.6841862  1.3232985
3 c  1.3073516 -0.6978598
4 d  0.4352904 -0.3971175

The code I would like to run is (note: I am not calling library in the actual function, I am just doing it here for simplicity's sake):

library(dplyr)
valid_values <- ref %>%
  select(z) %>% 
  unname() %>% 
  unlist() %>% 
  as.character()

summary <- user %>% 
  mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))

summary tells me which values of x in user are valid:

   x          y x_valid
1  v -0.7025836   FALSE
2  p -1.4586245   FALSE
3  f  0.1987113   FALSE
4  y  1.0544690   FALSE
5  o -0.7112214   FALSE
6  m  0.2956671   FALSE
7  b  0.3016737    TRUE
8  a -0.0945271    TRUE
9  x -0.2790357   FALSE
10 c  0.1681388    TRUE

Now, what do I use to replace ref with in my function code? Where should I store this data in my package? How do I load it? And what type of file should I covert it to?

The function should look something like:

x_check <- function(data) {

  # get valid values
  valid_values <- ??? %>%
    select(z) %>% 
    unname() %>% 
    unlist() %>% 
    as.character()

  # compare against valid values
  return(
    data %>% 
    mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
  )
}

What do I replace the ??? with to get my data? I do not care much whether or not the user is able to see this ref data I wish to load in.


I am using devtools::load_all("directory/for/my/package") to test my package. Relevant session information:

R version 3.4.0 (2017-04-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

other attached packages:
[1] roxygen2_6.0.1             devtools_1.13.2
Mark White
  • 1,228
  • 2
  • 10
  • 25
  • Have you read about [How to include data in R packages](http://r-pkgs.had.co.nz/data.html)? – Gregor Thomas Jul 11 '17 at 21:09
  • 2
    Generally, you store the data in the `data/` folder, you load it using `data()` (if it's not lazy loaded). And you can use `devtools::use_data()` to set that up for you. – Gregor Thomas Jul 11 '17 at 21:11
  • @Gregor Yes, I read through Hadley's chapter on it from that link, specifically. I have stored my data in the `data/` folder and tried to use `devtools::use_data(admit_source.RData)`, where `admit_source` is the name of the file, but I received the error: `Error: Could not find package root.` – Mark White Jul 11 '17 at 21:18
  • @Gregor note that the `DESCRIPTION` file has also specified `LazyData: true` – Mark White Jul 11 '17 at 21:19
  • I think you need to follow the link a little more closely and maybe read `?use_data` - you should give `use_data` an R object, it will take care of creating the RData file. And if you have errors like that, maybe your working directory isn't set to the package folder? It seems like your question would be *"why isn't `use_data` working? How can I avoid this error?"* All the stuff about your function seems unrelated. – Gregor Thomas Jul 11 '17 at 21:56
  • @Gregor I'm not necessarily tied to `devtools::use_data`; I just want to figure out a way to access that data when someone runs the function. I may be just confused, but it seems like Hadley specifically says to give it an `.RData` file generated using `save()`.I wasn't sure of `use_data` is what I wanted anyways, because the documentation asks for an *existing* object, which corresponds to why his example involves creating an object `x <- c(1:10)`. If `use_data` takes an existing object, how do I actually put the file into an R object? That's what I want, anyways. – Mark White Jul 11 '17 at 22:09
  • @MarkWhite Sorry to mention you here, but I think [this post](https://stackoverflow.com/a/60296961/4999991) should interest you. – Foad S. Farimani Feb 19 '20 at 21:07

2 Answers2

13

I figured it out, just in case anyone comes across this in the future. How I accomplished this was just loading the data from the /data file in the local environment within the function:

x_check <- function(data) {

  # get reference data
  data("ref", envir=environment())

  # get valid values
  valid_values <- ref %>%
    select(z) %>% 
    unname() %>% 
    unlist() %>% 
    as.character()

  # compare against valid values
  return(
    data %>% 
    mutate(x_valid=ifelse(x %in% valid_values, TRUE, FALSE))
  )
}
Mark White
  • 1,228
  • 2
  • 10
  • 25
4

See Hadley Wickham's book on R writing packages where he explains how to store data in a package.

"The most common location for package data is (surprise!) data/. Each file in this directory should be a .RData file created by save() containing a single object (with the same name as the file)."

This will make your dataset accessible to any user of your package with packagename::data.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110