0

I'm trying to improve the performance of a row-wise custom distance measure over a large matrix using the parallelDist R package and RcppArmadillo. The example they provide works with me

library(parallelDist)
library(RcppArmadillo)
library(RcppXPtrUtils)
euclideanFuncPtr <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B) {return sqrt(arma::accu(arma::square(A - B))); }",depends = c("RcppArmadillo"))
# distance matrix for user-defined euclidean distance function
# (note that method is set to "custom")
x = matrix(1:16,ncol=2)
parDist(x, method="custom", func = euclideanFuncPtr)
# same result as dist(x) 

I want to do something similar with the following:

overlapSlow = function(x,y){sum(pmin(x,y))/sum(pmax(x,y))}
x = matrix(1:16,ncol=2)
res = matrix(NA,nrow=8,ncol=8)
for (i in 1:nrow(x)) {
    for (j in 1:nrow(x)){
        if (i>j)
        {
            res[i,j]=overlapSlow(x[i,],x[j,])
        }
    }
}
res=as.dist(res)

But the following Xptr object fails to compile:

overlap <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B) {arma::accu(pmin(A,B)) / arma::accu(pmax(A,B)); }",depends = c("RcppArmadillo"))

Throwing the error message

error: no matching function for call to ‘pmin(const mat&, const mat&)

I never used Rcpp before, but I suppose this does not work because pmax and pmin require a Numeric vector, but I am struggling to find a way to convert the objects A and B internally.

NOTE: parallelDist vignette mentions that "The user-defined function needs to have the following signature: double customDist(const arma::mat &A, const arma::mat &B) Note that the return value must be a double and the two parameters must be of type const arma::mat &param." so I cannot simply make changes in the signature.

Thanks in advance for any help/tips!

E

UPDATE 1 I could, of course, do this entirely in Rcpp (without using the structure imposed by parellelDist). The following does work fine:

// [[Rcpp::depends(RcppProgress)]]
#include <progress.hpp>
#include <progress_bar.hpp>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix overlapDistance(Rcpp::NumericMatrix x, bool display_progress=true) {
  int n = x.nrow();
  Progress p(n*n, display_progress);
  NumericMatrix m( n );
  for (int i=0; i < n; ++i) {
    if (Progress::check_abort() )
      return -1.0;
    for (int j=0; j < n; ++j){
      p.increment(); // update progress
      if (i>j)
      {m(i,j)= sum(pmin(x(i,_),x(j,_)))/sum(pmax(x(i,_),x(j,_)));}
    }
  }
  return m;
}

and it is much faster than plain R, but still too slow for what I want to achieve...

UPDATE 2 I tried the following

library(parallelDist)
library(RcppArmadillo)
library(RcppXPtrUtils)
overlap <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B) 
                   {Rcpp::NumericMatrix x = as<Rcpp::NumericMatrix>(wrap(A));
                   Rcpp::NumericMatrix y = as<Rcpp::NumericMatrix>(wrap(B));
                   return sum(pmin(x,y)) / sum(pmax(x,y)); }",depends = c("RcppArmadillo"))
x= matrix(1:16, ncol=2)
parDist(x, method="custom", func = overlap)

which compiles without error and runs. However when I re-ran the line parDist(x, method="custom", func = overlap) I get

Error in parDist(x, method = "custom", func = overlap) : Not a matrix.

and when I tried the third time I get

Error in parDist(x, method = "custom", func = overlap) : bad value

and eventually


 *** caught segfault ***
address 0x140, cause 'memory not mapped'

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: parDist(x, method = "custom", func = overlap)
Traceback:


Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:  1: parDist(x, method = "custom", func = overlap)
  • The error is correct. While we have `pmin` and `pmax` in Rcpp they are defined for another matrix type. C++ being typed. it sees no such function to call for the arguments you call it with here _i.e._ the corresponding 'signature' in C++ parlance. You need to adjust your code. – Dirk Eddelbuettel Nov 13 '20 at 13:48
  • Thanks - but how do I adjust the code? How do I convert an armadillo `arma::mat` into Rcpp's `NumericMatrix`? – Enrico Crema Nov 13 '20 at 14:47
  • 1
    You need to look into `Rcpp::as<>` . I would also recommend to try that in a little function leaving some space for debug prints etc rather than cramming it into one-liners. – Dirk Eddelbuettel Nov 13 '20 at 15:01
  • 1
    You may also consider writing your own custom `pmin` and `pmax` for Armadillo matrices to save the conversion back and forth. Though the conversion should be simple and light, there is no underlying copy involved. – Dirk Eddelbuettel Nov 13 '20 at 15:03
  • Thanks - will try doing this entirely in armadillo as I am clearly struggling with the conversion... – Enrico Crema Nov 13 '20 at 15:09

0 Answers0