0

I am creating a package for work (called mypackage) and have a function I would like to include in it. This function seems to work perfectly fine when I call it like this:

myFunction()

But, it fails when I call it like this:

mypackage::functionName()

I am using roxygen2 to build my package. The function looks like this:

volumeDiffBoot.test <- function(screenedData, B=100, recSetting=c(80, 6), 
                                curSetting=c(80, 6), numCore=3){
  inputStrings <- unique(screenedData$ID)
  cl <- makeCluster(numCore)
  # must pass all relevant variables to the worker nodes: 
  clusterExport(cl, list=c("inputStrings", "B", "screenedData", "curSetting", "recSetting"), 
                envir=environment())
  clusterEvalQ(cl, library(data.table))
  # change the data.frame to a data.table (MUCH faster this way)
  b <- as.data.table(screenedData)
  setkey(b, ID) # set the key for faster subsetting
  # bootstrap sampling of volume differences:
  bootSamples <- parLapply(cl, as.matrix(1:B), function(i){
    bootSample1 <- sample(inputStrings, replace=TRUE)
    bootSample2 <- sample(inputStrings, replace=TRUE)
    numHits <- lapply(1:length(bootSample1), function(j){
      # subsets the data by ID first using data.table key (much faster this way): 
      d1 <- b[list(bootSample1[j])]
      # return the number of rows meeting the accuracy and variation conditions: 
      curHits <- d1[accuracy >= curSetting[1] & numVariation <= curSetting[2], .N]

      d2 <- b[list(bootSample2[j])]
      recHits <- d2[accuracy >= recSetting[1] & numVariation <= recSetting[2], .N]
      return(c(curHits, recHits))
    })
    q <- do.call(rbind, numHits)
    return(sum(q[,1]) - sum(q[,2]))
  })
  stopCluster(cl) # close the cluster
  bootSamples <- unlist(bootSamples)
  cat("If the following confidence interval contains zero, the difference in volume is not significant.\n")
  print(quantile(bootSamples, c(0.025, 0.975)))
  return(bootSamples)
}

Here is some data to use:

myDat <- structure(list(accuracy = c(0L, 0L, 100L, 100L, 100L, 100L, 100L, 
100L, 85L, 73L, 0L, 0L, 90L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 100L, 
100L, 100L, 94L, 100L), ID = c(1016L, 1017L, 1019L, 1014L, 1016L, 
1010L, 1003L, 1005L, 1008L, 1015L, 1016L, 1008L, 1006L, 1012L, 
1001L, 1004L, 1011L, 1009L, 1010L, 1007L, 1008L, 1006L, 1002L, 
1014L, 1019L), numVariation = c(15, 11, 0, 0, 0, 0, 0, 0, 2, 
4, 14, 10, 1, 8, 9, 9, 15, 15, 14, 11, 0, 0, 0, 1, 0)), .Names = c("accuracy", 
"ID", "numVariation"), row.names = c(NA, 25L), class = "data.frame")

Here is my sessionInfo():

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] myPackage_0.1 ggplot2_1.0.1     data.table_1.9.4  dplyr_0.4.2       stringr_1.0.0     doSNOW_1.0.12    
 [7] snow_0.3-13       iterators_1.0.7   foreach_1.4.2     digest_0.6.8     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0      magrittr_1.5     MASS_7.3-43      munsell_0.4.2    colorspace_1.2-6 R6_2.1.1         plyr_1.8.3      
 [8] tools_3.2.1      grid_3.2.1       gtable_0.1.2     DBI_0.3.1        assertthat_0.1   reshape2_1.4.1   codetools_0.2-11
[15] stringi_0.5-5    scales_0.3.0     chron_2.3-47     proto_0.3-10    

The error being thrown when called via the package is:

> d <- mypackage::volumeDiffBoot.test(myDat, B=3, recSetting = c(88, 2), curSetting = c(80, 6))
Error in checkForRemoteErrors(val) : 
  3 nodes produced errors; first error: invalid subscript type 'list'
statsNoob
  • 1,325
  • 5
  • 18
  • 36
  • The "I see you're using `data.table`" shot in the dark is include `import(data.table)` in the package's `NAMESPACE` file. – Akhil Nair Sep 03 '15 at 14:04
  • I have never touched my NAMESPACE file. I just opened it up and it said . I just tried adding it to Imports:... in the DESCRIPTION file and it did not work.. – statsNoob Sep 03 '15 at 14:07
  • is it that clusterExport needs to have `varlist`, not `list` as its argument? – jeremycg Sep 03 '15 at 14:08
  • @jeremycg, I have tried that. I also tried just having it as a character vector (which seems to be the way most people do it). – statsNoob Sep 03 '15 at 14:10
  • I'm not sure why you tried to add it to the `DESCRIPTION` file, when I said the `NAMESPACE` one. – Akhil Nair Sep 03 '15 at 14:12

1 Answers1

4

Continuing on from my comment now that you added the error message:

Include import(data.table) in the NAMESPACE file for the package, underneath the <exportPattern("^[^\\.]")> line and rebuild your package etc. etc.

I had a similar problem. You'll notice the error is talking about a list, which is seen when you use the data.table syntax b[list(bootSample1[j])] - i.e. the syntax is wrong when calling the function from your package (I'm sure the terminology here is wrong, but that's just a sign that I don't deeply understand the issue).

Importing data.table via the namespace solved this for me.

More specifically, I got to this answer via the data.table FAQ.

6.9 I have created a package that depends on data.table. How do I ensure my package is data.table-aware so that inheritance from data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Akhil Nair
  • 3,144
  • 1
  • 17
  • 32
  • Thank you. I appreciate your swift answer. Now I need to go read up more on the NAMESPACE file. This is the first time I have touched it. – statsNoob Sep 03 '15 at 14:12