0

I am fitting a statistical model whose likelihood and gradient evaluation depend on many complicated matrix operations, for which the Eigen library is very well-suited. I am working within Rcpp and having great success with RcppEigen. My likelihood/gradient contain a loop over the data size that is, in theory, trivially parallelizable. I would like to parallelize my likelihood computation using RcppParallel.

RcppParallel requires input of a Rcpp::NumericVector or Rcpp::NumericMatrix. I am able to transform between Eigen::MatrixXd and Rcpp::NumericMatrix thanks to a very helpful post here. In my example below, I reproduce one of the RcppParallel examples using this conversion. However, I find the performance hit of doing this appears too severe to make parallel computations worthwhile.

My questions are as follows:

  1. Can an RcppParallel worker take/operate on an Eigen::MatrixXd directly?
  2. Can anybody suggest an alternative method of parallelizing a loop involving a significant amount of Eigen-specific matrix algebra?

Some further information:

  • Re-writing my entire program to not use Eigen is very undesirable.
  • My local machine is an M1 Mac; I have access to servers running Ubuntu so can move to that environment if that might help.

Here is an example from the RcppParallel documentation, and a modification with Eigen classes that does compile and run and give the right answer, but incurs a substantial performance hit. My desired outcome is to be able to input and output the Eigen classed objects, without incurring this performance hit-- either with RcppParallel or some other method of parallelization.

Note that this case doesn't even cover what I actually want, which is to be able to use Eigen::MatrixXd etc within the RcppParallel Worker. The RcppParallel documentation appears to explicitly advise against doing this type of conversion within the Worker: "The code that you write within parallel workers should not call the R or Rcpp API in any fashion. ". Nonetheless, even the below incurs a performance hit, which has me wondering about the overall feasibility of what I would like to do here.

#include <Rcpp.h>
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;

#include <math.h>


// Square root all the elements in a big matrix
struct SquareRoot : public Worker {
    const RMatrix<double> input;
    RMatrix<double> output;

    // Constructor
    SquareRoot(const NumericMatrix input, NumericMatrix output) : input(input), output(output) {}
    // Call operator
    void operator()(std::size_t begin,std::size_t end) {
        std::transform(input.begin()+begin,
                       input.begin()+end,
                       output.begin()+begin,
                       (double(*)(double)) sqrt);
    }
};


// [[Rcpp::export]]
NumericMatrix parallelMatrixSqrt(NumericMatrix x) {
    NumericMatrix output(x.nrow(),x.ncol());

    SquareRoot mysqrt(x,output);
    parallelFor(0,x.length(),mysqrt,100);

    return output;
}

// Try it with an Eigen input and output, compare performance
// [[Rcpp::export]]
Eigen::MatrixXd parallelMatrixSqrtEigen(Eigen::MatrixXd x) {
    // Convert to NumericMatrix
    SEXP s = wrap(x);
    NumericMatrix output(x.rows(),x.cols());
    NumericMatrix xR(s);
    // Do the parallel computations
    SquareRoot mysqrt(xR,output);
    parallelFor(0,xR.length(),mysqrt,100);
    // Convert back
    Eigen::MatrixXd s2 = as<Eigen::MatrixXd>(output);

    return s2;
}

Performance:

codepath <- "~/work/projects/misc/rcppparallel"
codefile <- "try-rcppparallel.cpp"

library(Rcpp)
library(RcppParallel)

sourceCpp(file.path(codepath,codefile))

n <- 1000
mm <- matrix(rnorm(n*n)^2,n,n)
microbenchmark::microbenchmark(
    sqrt(mm),
    parallelMatrixSqrt(mm),
    parallelMatrixSqrtEigen(mm)
)

Results:

Unit: microseconds
                        expr      min        lq      mean    median       uq      max neval cld
                    sqrt(mm) 1179.242 1294.6775 1692.3123 1384.7955 1464.213 3581.924   100  b 
      parallelMatrixSqrt(mm)  321.973  496.3665  909.3078  571.0685  784.617 3582.785   100 a  
 parallelMatrixSqrtEigen(mm) 3223.133 3504.1675 4422.4277 4071.7715 5313.190 6737.735   100   c

A major thank you in advance!

astring
  • 1
  • 1
  • Welcome to StackOverflow. In short, it depends where you instantiate it. You cann use RcppEigen objects that you passed in from R as those are (likely) mapped to R memory, and that violates the 'no multithreading with R object as you may tickle garbage collection'. The reason RcppParallel has its own is to be separate; if your memory use remains separate (as in: Eigen just in C++, no objects received from R) then you may be able to do this. – Dirk Eddelbuettel Mar 24 '23 at 15:10
  • 1
    Thank you! The answer to my question appears to be "no" in this case. I very much appreciate the quick response– and may I say I very much appreciate the Rcpp package overall, it is truly incredible and has had a major positive impact on my research. – astring Mar 26 '23 at 16:13

0 Answers0