1

I know the problem I have is a thread-safety issue. As the code I have now will execute with 'seThreadOptions(1)'. My question is what would be a good practice to overcome this.

I know this: Threadsafe function pointer with Rcpp and RcppParallel via std::shared_ptr Will come into play somehow. And I have also been thinking/playing around with making the internal function part of the structure for the parallel worker. Realistically, I am calling two internal functions and I would like one to be variable and the other to be constant, this tends me to think that i will need 2 solutions.

The error is that the R session, in rstudio, crashes. Two things of note here: 1. if I 'setThreadOptions(1)' this runs fine. 2. if I move 'myfunc' into the main cpp file and make the call simply 'myfunc' this also runs fine.

Here is a detailed example:

First cpp file:

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::interfaces(cpp)]]
// [[Rcpp::plugins(cpp11)]]
#include "RcppArmadillo.h"
using namespace arma;
using namespace std;

// [[Rcpp::export]]
double myfunc(arma::vec vec_in){

  int Len = arma::size(vec_in)[0];
  return (vec_in[0] +vec_in[1])/Len;
}

Second,cpp file:

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(ParallelExample)]]

#include "RcppArmadillo.h"
#include "RcppParallel.h"
#include "ParallelExample.h"
#include <random>
#include <memory>
#include <math.h>

using namespace Rcpp;
using namespace arma;
using namespace RcppParallel;
using namespace std;

struct PARALLEL_WORKER : public Worker{

  const arma::vec &input;
  arma::vec &output;

  PARALLEL_WORKER(const arma::vec &input, arma::vec &output) : input(input), output(output) {}

  void operator()(std::size_t begin, std::size_t end){


    std::mt19937 engine(1);

    // Create a loop that runs through a selected section of the total Boot_reps
    for( int k = begin; k < end; k ++){
      engine.seed(k);
      arma::vec index = input;
      std::shuffle( index.begin(), index.end(), engine);

      output[k] = ParallelExample::myfunc(index);
  }
}

};

// [[Rcpp::export]]
arma::vec Parallelfunc(int Len_in){

  arma::vec input = arma::regspace(0, 500);
  arma::vec output(Len_in);

  PARALLEL_WORKER  parallel_woker(input, output);
  parallelFor( 0, Len_in, parallel_woker);
  return output;
}

Makevars, as I am using a macintosh:

CXX_STD = CXX11

PKG_CXXFLAGS +=  -I../inst/include

And Namespace:

exportPattern("^[[:alpha:]]+")
importFrom(Rcpp, evalCpp)
importFrom(RcppParallel,RcppParallelLibs)
useDynLib(ParallelExample, .registration = TRUE)

export(Parallelfunc)
skatz
  • 115
  • 7
  • Error message? What data type is `output`? How is `i` defined? You use `random`: How is that used? There are many more questions like this. They all boil down to: Where is the [mcve]? – Ralf Stubner Aug 23 '18 at 04:47
  • As the example has several files, it isn't the easiest to run. Also the error, I presume a threadsafety issue as it crashes studio similar to my other threadsafety issues I have worked through. And again, the problem is only when I call the secondary function through the cpp header. – skatz Aug 23 '18 at 07:53
  • I will have a look. Is the second cpp file also part of the package? I would leave out all these `using namespace ...`. – Ralf Stubner Aug 23 '18 at 08:58

2 Answers2

1

When you call ParallelExample::myfunc, you are calling a function defined in inst/include/ParallelExample_RcppExport.h, which uses the R API. This is something one must not do in a parallel context. I see two possibilities:

  1. Convert myfunc to header-only and include it in int/include/ParallelExample.h.
  2. If the second cpp file is within the same package, put a suitable declaration for myfunc into src/first.h, include that file in both src/first.cpp and src/second.cpp, and call myfunc instead of ParallelExample::myfunc. After all, it is not necessary to register a function with R if you only want to call it within the same package. Registring with R is for functions that are called from the outside.
Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
0

In some ways this kinda defeats the purpose of the built in interface cpp feature of Rcpp.

First, cpp saved as 'ExampleInternal.h':

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]
#include "RcppArmadillo.h"
using namespace arma;
using namespace std;

namespace ExampleInternal
{

  double myfunc3(arma::vec vec_in){

    int Len = arma::size(vec_in)[0];
    return (vec_in[0] +vec_in[1])/Len;
  }


}

and second:

#include "ParallelExample.h"
#include "ExampleInternal.h"
#include <random>
#include <memory>
#include <math.h>

using namespace Rcpp;
using namespace arma;
using namespace RcppParallel;
using namespace ExampleInternal;
using namespace std;

struct PARALLEL_WORKER : public Worker{

  const arma::vec &input;
  arma::vec &output;

  PARALLEL_WORKER(const arma::vec &input, arma::vec &output) : input(input), output(output) {}

  void operator()(std::size_t begin, std::size_t end){


    std::mt19937 engine(1);

    // Create a loop that runs through a selected section of the total Boot_reps
    for( int k = begin; k < end; k ++){
      engine.seed(k);
      arma::vec index = input;
      std::shuffle( index.begin(), index.end(), engine);

      output[k] = ExampleInternal::myfunc3(index);
  }
}

};

// [[Rcpp::export]]
arma::vec Parallelfunc(int Len_in){

  arma::vec input = arma::regspace(0, 500);
  arma::vec output(Len_in);

  PARALLEL_WORKER  parallel_woker(input, output);
  parallelFor( 0, Len_in, parallel_woker);
  return output;
}
skatz
  • 115
  • 7
  • What "defeats the purpose of the built in interface cpp feature of Rcpp"? That feature is meant for functions that can be called from an external package via an interface provided by R. Since it is an R interface, it is not thread safe. However, there is no need to use the R interface *within* the package. And having to write header files to make code available for different source files is normal for C++. – Ralf Stubner Aug 25 '18 at 17:49
  • Point well taken, I hadn't thought calling an c++ function from outside the package. – skatz Aug 26 '18 at 07:30
  • Second, while researching header files, I came across what would be, I think, the best solution. Which is to add 'double myfunc(arma::vec vec_in);' to the second cpp file, if I remember correctly this is a function declaration? And tells c++ this function exists. So unless you need a fair number of internal functions this saves having to create a header file or namespace. – skatz Aug 26 '18 at 07:35
  • As long as you are careful about keeping this consistent, you can do it like this. It gets problematic if you have more than one function that you use in more then one place. In that case it is common practice to create a header file with the function declaration (no namespace or function definition needed) and include that file in all the files where you use the function. – Ralf Stubner Aug 26 '18 at 08:57