3

I have to calculate the pairwise distances of two-dimensional points. These points are stored in a matrix, the first column containing the x values, the second column the y values. The distance function I need is not the euclidean metric though, but some custom metric.

For the euclidean distance, using the rdist() function from the fields package would give me what I want:

require(fields)
a = matrix(c(0,7,3,2,0,8,2,8), nrow=4)
b = matrix(c(2,6,2,6,2,2,6,6), nrow=4)
rdist(a,b)

         [,1]     [,2]     [,3]     [,4]
[1,] 2.828427 6.324555 6.324555 8.485281
[2,] 7.810250 6.082763 5.385165 2.236068
[3,] 1.000000 3.000000 4.123106 5.000000
[4,] 6.000000 7.211103 2.000000 4.472136

To use my own metric, I wrote a simple rdist() replacement that calculates the distance of the points:

my_rdist <- function(a, b) {
    a_len = dim(a)[1]
    b_len = dim(b)[1]
    distmatrix = matrix(data=NA, nrow=a_len, ncol=b_len)
    for(i in seq(1,a_len)) {
        for(j in seq(1,b_len)) {
            distmatrix[i,j] = my_distance( a[i,], b[j,] )
        }
    }
    return(distmatrix)
}

This also works as expected, but it's painfully slow. The my_rdist() function takes about 20 minutes where the rdist() function from the fields package needs less than 2 seconds. (The custom metric at the moment just computes the square of the euclidean distance. The idea is to kind of penalize larger distances in the following processing of my data set.)

Are there any replacements for rdist() I'm not aware of that can handle custom metric functions? Or can you provide me with any hints to speed up my my_rdist()function? I'm pretty new to R so perhaps I've made some obvious mistakes.

z80crew
  • 1,150
  • 1
  • 11
  • 20
  • 1
    This is a question I asked yesterday that is related, which may give some tips: http://stackoverflow.com/questions/23817341/faster-i-j-matrix-cell-fill/23817578#23817578 – Tyler Rinker May 23 '14 at 18:03
  • 1
    Run a `Rprof` to see whether it's your `my_distance` which is the time pig here. – Carl Witthoft May 23 '14 at 18:40
  • @Carl it's indeed `my_distance` that slows down the script. Replacing its body with a simple `return(1)` decreases the runtime down to less than 4 seconds. @Tyler, your question revealed a crucial hint: I used the `x[i]` notation to access the vector elements. Just replacing it with `x[[i]]` decreases the runtime from 20 minutes to less than 2 minutes. – z80crew May 24 '14 at 10:20

0 Answers0