I'm trying to improve the performance of a row-wise custom distance measure over a large matrix using the parallelDist R package and RcppArmadillo. The example they provide works with me
library(parallelDist)
library(RcppArmadillo)
library(RcppXPtrUtils)
euclideanFuncPtr <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B) {return sqrt(arma::accu(arma::square(A - B))); }",depends = c("RcppArmadillo"))
# distance matrix for user-defined euclidean distance function
# (note that method is set to "custom")
x = matrix(1:16,ncol=2)
parDist(x, method="custom", func = euclideanFuncPtr)
# same result as dist(x)
I want to do something similar with the following:
overlapSlow = function(x,y){sum(pmin(x,y))/sum(pmax(x,y))}
x = matrix(1:16,ncol=2)
res = matrix(NA,nrow=8,ncol=8)
for (i in 1:nrow(x)) {
for (j in 1:nrow(x)){
if (i>j)
{
res[i,j]=overlapSlow(x[i,],x[j,])
}
}
}
res=as.dist(res)
But the following Xptr
object fails to compile:
overlap <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B) {arma::accu(pmin(A,B)) / arma::accu(pmax(A,B)); }",depends = c("RcppArmadillo"))
Throwing the error message
error: no matching function for call to ‘pmin(const mat&, const mat&)
I never used Rcpp before, but I suppose this does not work because pmax
and pmin
require a Numeric vector, but I am struggling to find a way to convert the objects A and B internally.
NOTE: parallelDist vignette mentions that
"The user-defined function needs to have the following signature:
double customDist(const arma::mat &A, const arma::mat &B)
Note that the return value must be a double and the two parameters must be of type const
arma::mat ¶m." so I cannot simply make changes in the signature.
Thanks in advance for any help/tips!
E
UPDATE 1 I could, of course, do this entirely in Rcpp (without using the structure imposed by parellelDist). The following does work fine:
// [[Rcpp::depends(RcppProgress)]]
#include <progress.hpp>
#include <progress_bar.hpp>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix overlapDistance(Rcpp::NumericMatrix x, bool display_progress=true) {
int n = x.nrow();
Progress p(n*n, display_progress);
NumericMatrix m( n );
for (int i=0; i < n; ++i) {
if (Progress::check_abort() )
return -1.0;
for (int j=0; j < n; ++j){
p.increment(); // update progress
if (i>j)
{m(i,j)= sum(pmin(x(i,_),x(j,_)))/sum(pmax(x(i,_),x(j,_)));}
}
}
return m;
}
and it is much faster than plain R, but still too slow for what I want to achieve...
UPDATE 2 I tried the following
library(parallelDist)
library(RcppArmadillo)
library(RcppXPtrUtils)
overlap <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B)
{Rcpp::NumericMatrix x = as<Rcpp::NumericMatrix>(wrap(A));
Rcpp::NumericMatrix y = as<Rcpp::NumericMatrix>(wrap(B));
return sum(pmin(x,y)) / sum(pmax(x,y)); }",depends = c("RcppArmadillo"))
x= matrix(1:16, ncol=2)
parDist(x, method="custom", func = overlap)
which compiles without error and runs. However when I re-ran the line parDist(x, method="custom", func = overlap)
I get
Error in parDist(x, method = "custom", func = overlap) : Not a matrix.
and when I tried the third time I get
Error in parDist(x, method = "custom", func = overlap) : bad value
and eventually
*** caught segfault ***
address 0x140, cause 'memory not mapped'
*** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
1: parDist(x, method = "custom", func = overlap)
Traceback:
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 1: parDist(x, method = "custom", func = overlap)