1

I am trying to write code (using R) that returns a matrix that contains the squared distance between all pairs of rows. Below is an implementation that I have written. It works as expected but can get very slow as the number of rows gets large. From my observations this line (combn(x,m=2)) takes the longest to run. Hence I was wondering if anyone has any suggestions as to how the code can be made more efficient for large number of rows.Thanks in advance

gen.dist <- function(x){
  n <- nrow(x)
  idx <- combn(seq(1,n),m=2)
  d <- apply(x, 2, calc.distance, combinations=idx,alpha=2)
  return(list(n=n,d=d))
}

calc.distance <- function(x,combinations,alpha){
  x1 <- x[combinations[1,]]
  x2 <- x[combinations[2,]]
  output <- (x1 - x2)^alpha
  return(output)
}
  • Take a look [at this RcppParallel implemention](http://gallery.rcpp.org/articles/parallel-distance-matrix/), it might help. – shayaa Oct 21 '16 at 04:14
  • 1
    There is a built in function dist() that can compute a number of distance metrics for you including euclidean distance, and returns a matrix of distances between observations. I suspect it will be much faster. – gfgm Oct 21 '16 at 04:30
  • You are not calculating one distance between every pair of rows, but you are calculating the squared difference between the coordinates of every pair of rows. Is that what you are looking for in a faster alternative as well? Or would a single distance measure for every combination of rows suffice? – KenHBS Oct 21 '16 at 15:38
  • Thanks for the answer. I tried the built in dist() function and it improved computation but it still runs slow when the matrix gets bigger. Yes in the faster alternative, I am looking for the squared distance between the coordinates of every pair of rows. A single distance measure would not be sufficient. – user3695689 Oct 24 '16 at 05:50

0 Answers0