9

Users

I have a distance matrix dMat and want to find the 5 nearest samples to the first one. What function can I use in R? I know how to find the closest sample (cf. 3rd line of code), but can't figure out how to get the other 4 samples.

The code:

Mat <- replicate(10, rnorm(10))
dMat <- as.matrix(dist(Mat))
which(dMat[,1]==min(dMat[,1]))

The 3rd line of code finds the index of the closest sample to the first sample.

Thanks for any help!

Best, Chega

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Chega
  • 195
  • 2
  • 9

2 Answers2

8

You can use order to do this:

head(order(dMat[-1,1]),5)+1
[1] 10  3  4  8  6

Note that I removed the first one, as you presumably don't want to include the fact that your reference point is 0 distance away from itself.

James
  • 65,548
  • 14
  • 155
  • 193
  • Thanks for this quick response! Allow me one question: I do understand "order" and "head", but what is the purpose of the last term "+1"? – Chega Jan 16 '13 at 10:51
5

Alternative using sort:

sort(dMat[,1], index.return = TRUE)$ix[1:6]

It would be nice to add a set.seed(.) when using random numbers in matrix so that we could show the results are identical. I will skip the results here.

Edit (correct solution): The above solution will only work if the first element is always the smallest! Here's the correct solution that will always give the 5 closest values to the first element of the column:

> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1

Example:

> dMat <- matrix(c(70,4,2,1,6,80,90,100,3), ncol=1)
# James' solution
> head(order(dMat[-1,1]),5) + 1
[1] 4 3 9 2 5 # values are 1,2,3,4,6 (wrong)
# old sort solution
> sort(dMat[,1], index.return = TRUE)$ix[1:6]
[1] 4 3 9 2 5 1 #  values are 1,2,3,4,6,70 (wrong)
# Correct solution
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
[1] 6 7 8 5 2 # values are 80,90,100,6,4 (right)
Arun
  • 116,683
  • 26
  • 284
  • 387
  • Thanks - also for the hint with set.seed() - makes absolutely sense! – Chega Jan 16 '13 at 11:04
  • Another alternative for the general case is to return the n+1 closest indices and remove the first, ie `head(order(dMat[,1]),6)[-1]` – James Jan 16 '13 at 12:30
  • 1
    @Arun Ah yes, this would only work if for column n you want to refer to element n. But this is what the distance matrix would return. – James Jan 16 '13 at 12:48