0

Similar to canberra distance - inconsistent results , I wrote my own distance calculation, but I would like to perform this for a much greater set of data, and then create a distance matrix from the results.

My initial function is

canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))

Now, I would like to apply this function to every pair of rows in my data frame, and then create a distance matrix from this calculation. Let's say my data is:

data<-data.frame(replicate(500,sample(1:100,50,rep=TRUE)))

I'm struggling on this next part, of how to apply this to every pair of rows and then create a matrix that essentially mimics

dist(data,method="canberra")

I've attempted:

for (y in 1:50)
{
    for (z in 2:50)
    {
    canb.dist(data[y,1:500],data[z,1:500])
    }
}

But clearly it doesn't. Is there a way to run through every pair and replicate a distance matrix manually?

Community
  • 1
  • 1
coderX
  • 424
  • 5
  • 16

1 Answers1

2

You can use combn to create pairs of rows and calculate your Canberra distance for each pair. Then to convert into a dist class, convert the indices and values into a matrix using the sparse Matrix package

#OP's data
set.seed(1)
canb.dist <- function(x, j) sum((abs(x-j))/(abs(x)+abs(j)))
data <- data.frame(replicate(500,sample(1:100,50,rep=TRUE)))
refdist <- dist(data, method="canberra")

#convert to matrix
mat <- as.matrix(data)

#sequence of row indices
rowidx <- seq_len(nrow(mat))

#calculate OP's Canberra dist for each pair of rows
triangular <- combn(rowidx, 2, function(x) c(x[1], x[2], canb.dist(mat[x[1],], mat[x[2],])))

#construct the matrix given the indices and values using Matrix library,
#convert into a matrix before converting into a dist class
#the values refer to the diagonal, lower triangular and upper triangular
library(Matrix)
ansdist <- as.dist(as.matrix(sparseMatrix(
    i=c(rowidx, triangular[1,], triangular[2,]), 
    j=c(rowidx, triangular[2,], triangular[1,]),
    x=c(rep(0, length(rowidx)), triangular[3,], triangular[3,])
)))

#idea from http://stackoverflow.com/questions/17375056/r-sparse-matrix-conversion/17375747#17375747
range(as.matrix(refdist) - as.matrix(ansdist))
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • This worked perfectly. I wasn't thinking it would be as complex as it worked out to be but thank you very much! – coderX Mar 23 '17 at 23:54