0

I have 1000 csv files in my working directory and each file has a location Id, rainfall and temperature. The structure of one file is shown below:

set.seed(123)
my.dat <- data.frame(Id = rep(1, each = 365),
                     rain = runif(365, min = 0, max = 20),
                     tmean = sample(20:40, 365, replace = T))

I wrote an Rcpp function that is also stored in my working directory. This function takes in rainfall and temperature data and calculates some derived variables var1 andvar2. I want to read each location's weather data and apply the function and save the corresponding output using foreach package.

location.vec <- 1:1000
  
myClusters <- makeCluster(6) 
registerDoParallel(myClusters)

foreach(i = 1:length(location.vec), 
       .packages = c('Rcpp', 'dplyr', 'data.table'), 
       .noexport = c('myRcppFunc'),
       .verbose = T) %dopar% 

{
   
  Rcpp::sourceCpp('myRcppFunc.cpp')  
   
  idRef <- location.vec[i]
 
  # read the weather data
  temp_weather <- fread(paste0('weather_',idRef,'.csv'))
   
  # apply my Rcpp function
  temp_weather[, c("var1","var2") := myRcppFunc(rain, tmean)]
   
  # save my output
  fwrite(temp_weather, 'paste0('weather_',idRef_modified,'.csv')')
}
 
stopCluster(myClusters)        

This loop seems to have a weird behaviour. Sometimes it just gets stuck on iteration 10, sometimes on 40 etc everytime I run it and then I have to kill the job.

My doubt is this driven by the fact that multiple process are trying to access the Rcpp function at the same time which is leading to this issue? How can I fix it? Can I read in the Rcpp function in the foreach argument so that I don't have to keep loading it? Any other advise?

Thanks

89_Simple
  • 3,393
  • 3
  • 39
  • 94
  • 2
    That has been discussed a number of times so someone may find a duplicate but in short: Do NOT use `sourceCpp()`. Build a package, load it on each worker. Parallel calls of _properly setup_ Rcpp function works just as well as any other compiled R function. But no shortcuts. Also: you miss-spelled it: `myRcppFunc`. Not Rccp. – Dirk Eddelbuettel Jul 19 '20 at 23:40
  • Thank you. I tried to look it up. Any case you can point to me any resources on how to put an Rcpp function into a package? – 89_Simple Jul 19 '20 at 23:45
  • 3
    [SO post here](https://stackoverflow.com/questions/14288254/moving-from-sourcecpp-to-a-package-w-rcpp). [Advanced R](http://adv-r.had.co.nz/Rcpp.html#rcpp-package). – Croote Jul 20 '20 at 00:34
  • 3
    Use a search engine, or just search here. Enter `[rcpp] build package` in the search bar above to search for _build [a] package_ in the context of Rcpp selected by the tag `[rcpp]`. That simple search alone yields 284 answers. You could also try Google etc. We have been writing about this and explaining for a decade, you can take advantage of that just by searching. Link with results: https://stackoverflow.com/search?q=%5Brcpp%5D+build+package – Dirk Eddelbuettel Jul 20 '20 at 02:29

0 Answers0