Rcpp: my distance matrix program is slower than the function in package

Question

I would like to calculate the pairwise euclidean distance matrix. I wrote Rcpp programs by the suggestion of Dirk Eddelbuettel as follows

NumericMatrix calcPWD1 (NumericMatrix x){
  int outrows = x.nrow();
  double d;
  NumericMatrix out(outrows,outrows);

  for (int i = 0 ; i < outrows - 1; i++){
    for (int j = i + 1  ; j < outrows ; j ++){
      NumericVector v1= x.row(i);
      NumericVector v2= x.row(j);
      NumericVector v3=v1-v2;
      d = sqrt(sum(pow(v3,2)));
      out(j,i)=d;
      out(i,j)=d;
    }
  }
  return (out) ;
}

But I find my program is slower than dist function.

> benchmark(as.matrix(dist(b)),calcPWD1(b))
                test replications elapsed relative user.self sys.self user.child sys.child
1 as.matrix(dist(b))          100  24.831    1.000    24.679    0.010          0         0
2        calcPWD1(b)          100  27.362    1.102    27.346    0.007          0         0

Do you guys have any suggestion? My matrix is very simple. There is no column names or row names, just plain matrix (for example like b=matrix(c(rnorm(1000*10)),1000,10)). Here is the program of dist

> dist
function (x, method = "euclidean", diag = FALSE, upper = FALSE, 
    p = 2) 
{
    if (!is.na(pmatch(method, "euclidian"))) 
        method <- "euclidean"
    METHODS <- c("euclidean", "maximum", "manhattan", "canberra", 
        "binary", "minkowski")
    method <- pmatch(method, METHODS)
    if (is.na(method)) 
        stop("invalid distance method")
    if (method == -1) 
        stop("ambiguous distance method")
    x <- as.matrix(x)
    N <- nrow(x)
    attrs <- if (method == 6L) 
        list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag, 
            Upper = upper, method = METHODS[method], p = p, call = match.call(), 
            class = "dist")
    else list(Size = N, Labels = dimnames(x)[[1L]], Diag = diag, 
        Upper = upper, method = METHODS[method], call = match.call(), 
        class = "dist")
    .Call(C_Cdist, x, method, attrs, p)
}
<bytecode: 0x56b0d40>
<environment: namespace:stats>

I expect my program is faster than dist since in dist, there are too many thing to need to be checked (like method, diag).

Why would you expect it to be faster? Internally, `dist()` also uses compiled code... — Dirk Eddelbuettel, Apr 24 '16 at 23:51
@DirkEddelbuettel.I expect my program is faster than `dist` since in `dist`, there are too many thing to need to be checked (like `method`, `diag`). — Mike Brown, Apr 24 '16 at 23:56
That's why you want to profile to see how different parts take different amounts of time, given different input dimensions. Guessing is all good, but we often get it wrong. _Measuring_ if often preferable. — Dirk Eddelbuettel, Apr 25 '16 at 00:15
Does R ever use multiple threads to speed up calculations? Big matrix operations can often be parallelized. Also, could it be vectorizing calculations, using SIMD or AVX instructions? — Christopher Oicles, Apr 25 '16 at 05:17
@ChristopherOicles. Calculating distance matrix is in part of a subfunction. And I already put different subfunctions to different threads to parallel computing. I do not know whether it will be speed up if I do parallel computing in a another parallel computing. Also, I cannot find SIMD or AVX about R by google. Could you provide a reference? — Mike Brown, Apr 25 '16 at 05:48
I really don't know anything about RCPP, so keep that in mind... But I look at your NumericMatrix function, and I see normal-looking C++ `for` loops nested to 2 levels, variables being assigned values in a sequential manner (apparently serialized), and when I think of a parallel implementation of that function, I imagine all the math in the middle of the loop being dispatched to worker threads, each one handling some fraction of the loop's total iteration count. I noticed there is an RcppParallel project http://rcppcore.github.io/RcppParallel/ which seems to do this, though. — Christopher Oicles, Apr 25 '16 at 06:44
It seems that (1) extracting `v1`, (2) `v2`, (3) creating `v3`, (4) calling `pow`, (5) computing `sum`, all involve creating/scanning vectors of `length == ncol(x)`. Try doing all the above while reading each row of `x`; i.e. replace the four lines of `v1 = /v2 = /v3 = /d = ` in your function with `d = 0.0; for(int k = 0; k < nc; k++) d += pow(x(i, k) - x(j, k), 2); d = sqrt(d);`. — alexis_laz, Apr 25 '16 at 14:28

Dirk Eddelbuettel · Answer 1 · 2016-04-24T23:25:13.850

You were almost there. But your inner loop body tried to do too much in one line. Template programming is hard enough as it is, and sometimes it is just better to spread instructions out a little to give the compiler a better chance. So I just made it five statements, and built immediatelt.

New code:

#include <Rcpp.h>

using namespace Rcpp;

double dist1 (NumericVector x, NumericVector y){
  int n = y.length();
  double total = 0;
  for (int i = 0; i < n ; ++i) {
    total += pow(x(i)-y(i),2.0);
  }
  total = sqrt(total);
  return total;
}

// [[Rcpp::export]]
NumericMatrix calcPWD (NumericMatrix x){
  int outrows = x.nrow();
  int outcols = x.nrow();
  NumericMatrix out(outrows,outcols);

  for (int i = 0 ; i < outrows - 1; i++){
    for (int j = i + 1  ; j < outcols ; j ++) {
      NumericVector v1 = x.row(i);
      NumericVector v2 = x.row(j-1);
      double d = dist1(v1, v2);
      out(j-1,i) = d;
      out(i,j-1)= d;
    }
  }
  return (out) ;
}

/*** R
M <- matrix(log(1:9), 3, 3)
calcPWD(M)
*/

Running it:

R> sourceCpp("/tmp/mikebrown.cpp")

R> M <- matrix(log(1:9), 3, 3)

R> calcPWD(M)
         [,1]     [,2] [,3]
[1,] 0.000000 0.740322    0
[2,] 0.740322 0.000000    0
[3,] 0.000000 0.000000    0
R>

You may want to check your indexing logic though. Looks like you missed more comparisons.

Edit: For kicks, here is a more compact version of your distance function:

// [[Rcpp::export]]
double dist2(NumericVector x, NumericVector y){
  double d = sqrt( sum( pow(x - y, 2) ) );
  return d;
}

Thank you very much. But I compare the time between `dist` in R package and my own `calcPWD`. I find my function takes more time. Do you have any suggestion to improve my function? — Mike Brown, Apr 24 '16 at 23:12
For starters you do not need a loop. Look into the `Rcpp Sugar` vignette. Otherwise, profile... — Dirk Eddelbuettel, Apr 24 '16 at 23:15

coatless · Accepted Answer · 2016-04-25T20:04:26.513

Rcpp vs. Internal R Functions (C/Fortran)

First of all, just because you are writing the algorithm using Rcpp does not necessarily mean it will beat out the R equivalent, especially if the R function calls a C or Fortran routine to perform the bulk of the computations. In other cases where the function is written purely in R, there is a high probability that transforming it in Rcpp will yield the desired speed gain.

Remember, when rewriting internal functions, one is going up against the R Core team of absolutely insane C programmers most likely will win out.

Base Implementation of `dist()`

Secondly, the distance calculation R uses is done in C as indicated by:

.Call(C_Cdist, x, method, attrs, p)

, which is the last line of the dist() function's R source. This gives it a slight advantage vs. C++ as it more granular instead of templated.

Furthermore, the C implementation uses OpenMP when available to parallelize the computation.

Proposed modification

Thirdly, by changing the subset order slightly and avoiding creating an additional variable, the timings between versions decrease.

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericMatrix calcPWD1 (const Rcpp::NumericMatrix & x){
  unsigned int outrows = x.nrow(), i = 0, j = 0;
  double d;
  Rcpp::NumericMatrix out(outrows,outrows);

  for (i = 0; i < outrows - 1; i++){
    Rcpp::NumericVector v1 = x.row(i);
    for (j = i + 1; j < outrows ; j ++){
      d = sqrt(sum(pow(v1-x.row(j), 2.0)));
      out(j,i)=d;
      out(i,j)=d;
    }
  }

  return out;
}

Rcpp: my distance matrix program is slower than the function in package

2 Answers2

Rcpp vs. Internal R Functions (C/Fortran)

Base Implementation of `dist()`

Proposed modification

Linked

Rcpp: my distance matrix program is slower than the function in package

2 Answers2

Rcpp vs. Internal R Functions (C/Fortran)

Base Implementation of dist()

Proposed modification

Linked

Base Implementation of `dist()`